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DATA PROCESSOR WITH AN IMPROVED DATA DEPENDENCE 
5 DETECTOR 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

1° The present invention relates to a data processor, and more 

particularly to a data processor with an improved data dependence detector, 

2. Description of the Related Art 

A non-program sequence execution or an out-of-order execution 
15 has widely been used for improving a high speed processing, wherein the 
instructions arc executed in a different sequence or order from a definitive 
sequence or order defined by a program. In accordance with the non- 
program sequence execution or the out-of-order execution, the processor 
executes an instruction which has become executable prior to an instruction 
20 which has not been non-executable yet, even if the non-executable 
instruction is prior in program sequence to the executable instruction, 
thereby improving the performance of the processor as compared to when 
the processor executes instructions in accordance with program sequences 
defined by the program or in-order execution. 
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The condition for allowing the non-program sequence execution 
is that no read after write dependence between instructions with reference 
to a register. The read after write dependence may also be referred to as a 
flow dependence. 

5 If a post instruction, which is post in program sequence to a prior 

instruction, refers a register which is converted by the prior instruction, this 
means that a read after write dependence from the prior instruction to the 
post instruction is present. Tf the processor executes the post instruction and 
then the prior instruction in violation to the read after write dependence, 

10 then the meaning of the program is changed and it is no longer possible to 
obtain the correct execution result. Namely, if the read after write 
dependence is present to the register, then this means it impossible to 
execute the instructions in the non-progT&ni sequence. 

If a prior instruction, which is prior in program sequence to a 

15 post instruction, refers a register which is converted by the post instruction, 
this means that a write after read dependence from the prior instruction to 
the post instruction is present. The write after read dependence may also 
referred to as an anti-dependence. If the processor executes the post 
instruction and then the prior instruction in violation to the write after read 

20 dependence, then the meaning of the program is changed and it is no longer 
possible to obtain the correct execution result. Namely, if the write after 
read dependence is present to the register, then this means it impossible to 
execute the instructions in the non-program sequence. 

If a post instruction, which is post in program sequence to a prior 
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instruction, changes a register which is converted by the prior instruction, 
this means that a write after write dependence from the prior instruction to 
the post instruction is present. The write after write dependence is an output 
dependence. If the processor executes the post instruction and then the 
5 prior instruction in violation to the write after write dependence, then the 
meaning of the program is changed and it is no longer possible to obtain 
the correct execution result. Namely, if the write after write dependence is 
present to the register, then this means it impossible to execute the 
instructions in the non-program sequence. 

10 When the instruction is decoded, the register referred or changed 

by the instruction is confirmed, the instructions having the read after write 
dependence are likely to be executed by the program sequence. 

It has been know to cancel the write after read dependence and 
the write after write dependence for allowing the non-program sequence 

15 execution. 

Instructions which have accesses to a memory arc dependent not 
only on the dependence on the register but also on the dependence on the 
memory. 

Usually, the memory access instructions include a load 
20 instruction for reading out data from the memory and a store instruction for 
writing data into the memory. 

If two load instructions or two store instructions have different 
addresses from each other, no dependence is present to the memory. This 
allows such the instructions to be executed by the non-program sequence. 
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If two load instructions or two store instructions have the same 
address as each other, a dependence is present to the memory. If, for 
example, a load instruction reads out data from an address, to which the 
data are Stored by a store instruction which is prior in program sequence to 
5 the load instruction, then this means that a read after write dependence 
from the store instruction to the load instruction is present. In this Case, if 
the instructions are executed in a reverse sequence to the program sequence, 
then the program meaning is changed and it is no longer possible to obtain 
the correct result of the execution of the program. Namely, if the read after 

1 0 write dependence is present to the memory, it is impossible to execute the 
instructions in the non-program order. 

If, for example, a store instruction stores data to an address, from 
which the data have been read out by a load instruction which is prioT in 
program sequence to the store instruction, then this means that a write after 

15 read dependence from the load instruction to the store instruction is present. 
In this case, if the instructions are executed in a reverse sequence to the 
program sequence, then the program meaning is changed and it is no longer 
possible to obtain the correct result of the execution of the program. 
Namely, if the write after read dependence is present to the memory, it is 

20 impossible to execute the instructions in the non-program order. 

If, for example, a store instruction stores data to an address, to 
which the data have been stored by a store instruction which is prior in 
program sequence to the store instruction, then this means that an write 
after write dependence from the prior store instruction to the post store 
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instruction is present. In this case, if the instructions arc executed in a 
reverse sequence to the program sequence, then the program meaning is 
changed and it is no longer possible to obtain the correct result of the 
execution of the program. Namely, if the write after write dependence is 
5 present to the memory, it is impossible to execute the instructions in the 
non-program order. 

It has been know to cancel the write after read dependence and 
the write after write dependence by temporary storing data, which are to be 
stored by the store instruction, into a store buffer for allowing the non- 
10 program sequence execution. 

If the read after write dependence is present, it is necessary to 
execute the instructions in the program sequences. It is, however, likely that 
an address of the load/store instruction has been unknown until the 
instruction is bout to be executed. Namely, it is likely that the dependence 
15 has been unknown until the execution of the instruction. For this reason, a 
disadvantage in performance of the processor by the inhibition to the non- 
program sequence execution is large. 

FTG. 1A is a diagram illustrative of a program sequence of 
store/load instructions. The program sequence is that a load instruction 
20 "LD1" at an address "Al", a store instruction "ST I" at an address "A4 7 ", a 
load instruction "LD2" at an address "A2", a load instruction "LD3" at an 
address "A3", and a load instruction "LD4" at an address "A4". 

Assuming that the address t4 A4'" of the store instruction "ST1" is 
equal to the address "A4" of the load insLTuction "LD4", then the store 
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instruction "ST1" and the load instruction "LD4" access to the same 
address "A4", and the store instruction "STI" is prior in program sequence 
to the load instruction "LD4". A read after write dependence from the store 
instruction "STI " to the load instruction "LD4" is present. 
5 The program expects that the store instruction "STI" stores data 

at the address "A4" and then the load instruction "LD4" reads this data out 
from the address "A4", for which reason it is necessary that the store 
instruction "STI" is executed prior to the execution of the load instruction 
"LD4" in accordance with the program sequence. 
10 FIG. IB is a diagram illustrative of executions of instructions in 

the program sequence of FIG. 1 A. Cycle numbers, execution instructions, 
addresses of the execution instructions are shown. It is assumed that the 
address "A4"' of the Store instruction "STI" has not been known until the 
fifth cycle 5. 

15 In the first cycle 1, the load instruction "LD1" at the address 

"Al" is executed. In the fifth cycle 5, the store instruction "STI" at the 
address "A4"' is executed. In the sixth cycle 6, the load instruction "LD2" 
at the address <A2" is executed. In the seventh cycle 7, the load instruction 
"LD3" at the address "A3" is executed. In the eighth cycle 8, the load 

20 instruction "TD4" at the address "A4" is executed. 

Even if the addresses "A2", "A3", and "A4" become known in 
the second, third and fourth cycles 2, 3 and 4 respectively, then it is 
inhibited that the load instructions "LD2" "LD3", and "LD4" arc executed 
in the second, third and fourth cycles 2, 3 and 4 respectively which are 
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prior to the store instruction "STl" in the fifth cycle 5 because the address 
"A4"' of the store instruction "STl" has not been known until the fifth 
cycle 5, and thus the read after write dependence from the store instruction 
"STl" to respective one of the load instructions "LD2", "LD3" arid "LD4" 
5 has not been known until the fifth cycle 5. 

In accordance with the conventional program sequence execution, 
the load/store instructions are inhibited to be executed in the second, third 
and fourth cycles 2, 3 and 4. The eight cycles are necessary to execute the 
five load/store instructions. The program sequence execution may drop the 
1 0 effective performance of the processor. 

The speculative execution in accordance with the non-program 
sequence, assuming that the read after write dependence is not present, has 
been known as prior art. The speculative execution of instructions to the 
dependence between data will hereinafter be referred to as "data 
15 dependence speculative execution". 

In accordance with the data dependence speculative execution, it 
is possible in one case that the read after write dependence is actually not 
present and the speculative execution results in the success. It is also in 
another case that the read after write dependence is actually present and the 
20 speculative execution results in the failure. At the time when the read after 
write dependence becomes known, it is necessary to judge whether this 
case is either of the above two cases. 

If the read after write dependence is actually not present and the 
speculative execution results in the success, the continuation to execute the 
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subsequent instructions is allowed, whereby the effective performance of 
the processor is improved by the data dependence speculative execution in 
accordance with the non-program sequence. 

If, however, the read after write dependence is actually present 
5 and the speculative execution results in the failure, then the program 
meaning is Changed, thus it is no longer possible to ensure the correct result 
of the execution of the program. For this reason, the result obtained by the 
data dependence speculative execution in the non-program sequence is 
canceled, and in place the recovery process for the failure of the data 

10 dependence speculative execution is necessary. The recovery process for 
the failure of the data dependence speculative execution might be likely to 
drop the performance of the processor in comparison with the execution in 
the program sequence. If, however, a probability of success in the data 
dependence speculative execution is sufficiently higher than a probability 

15 of failure in the data dependence speculative execution, then the effective 
performance of the processor for processing the program may be improved 
in total. 

The non-program sequence execution is disclosed by Mike 
Johnson in "Super-scalar processor" 1994. The recovery process for the 
20 failure in the data dependence speculative execution is disclosed in 
Japanese laid-open patent publication No. 5-224927. 

FIG. 1C is a diagram illustrative of one example of the data 
dependence speculative execution which has resulted in the success. Cycle 
numbers, execution instructions, addresses of the execution instructions aTe 
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shown. It is assumed that the address "A4*" of the store instruction "ST1" 
has not been known until the fifth cycle 5. It is also assumed that the 
address "A2" of the load instruction "LD2" has been known in the second 
cycle 2, the address "A3" of the load instruction "LD3" has been known in 
5 the third cycle 3, and the address "A4" of the load instruction "LD4" has 
been known in the sixth cycle 6. 

In the first cycle 1, the load instruction "LD1" at the address 
"Al" is executed. In the second cycle 2, the load instruction "LD2" at the 
address "A2" is executed in non-program sequence because the address 

10 "A2" of the load instruction "LD2" has been known in the second cycle 2, 
whilst the address "A4'" of the store instruction "ST1" has not been known 
in the second cycle 2. In the second cycle 2, the read after write 
dependence from the store instruction "ST1" to the load instruction "LD2" 
has not been known. The load instruction "LD2" is executed speculatively 

15 to the store instruction "ST1 ". 

In the third cycle 3, the load instruction "IJD3" at the address 
"A3" is executed in non-program sequence because the address "A3" of the 
load instruction "LD3" has been known in the third cycle 3, whilst the 
address "A4'" of the store instruction "ST'l" has not been known in the 

20 third cycle 3. In the third cycle 3, the read after write dependence from the 
store instruction "STI" to the load instmction "LD3" has not been known. 
The load instruction "LD3" is executed speculatively to the store 
instruction "ST1". 

In the forth cycle 4, the address "A4"' of the store instruction 
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"ST1" and the address "A4" of the load instruction "LD4" have not been 
known. Either the store instruction "STl" and the load instruction "LD4" 
have been executed. 

Tn the fifth cycle 5, the address "A4"' of the store instruction 
5 "ST1" has become known, and the store instruction "ST1" is executed. 
Concurrently, the read after write dependence from the store instruction 
"ST1" to respective one of the load instruction "LD2" and the load 
instruction "LD3" is judged. In this case, the address "A2" of the load 
instruction "LD2" and the address "A3" of the load instruction "LD3" are 

10 different from the address "A4"' of the store instruction "ST1", then the 
read after write dependence is not present. 

It is, therefore, judged that the data dependence speculative 
executions of the load instruction "LD2" and the load instruction "LD3" 
result in success. The subsequent instruction is continuously executed. In 

15 the sixth cycle 6, the load instruction "LD4" is executed in the program 
sequence with reference to the store instruction "ST1", for which reason no 
program is raised even the read after write dependence is present from the 
store instruction "ST1" to the load instruction "LD4". 

The program sequence execution shown in FIG. IB needs the 

20 eight cycles. By contrast, the succeeded data dependence speculative 
execution in the non-program sequence shown in FIG. 1C needs the six 
cycles. The data dependence speculative execution in the non-program 
sequence improves the performance by two cycle, provided that the duta 
dependence speculative execution is succeeded. 
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FIG. ID is a diagram illustrative of one example of the data 
dependence speculative execution which has resulted in the failure. Cycle 
numbers, execution instructions, addresses of the execution instructions are 
shown. It is assumed that the address "A4"' of the store instruction "ST1" 
5 has not been known until the fifth cycle 5. It is also assumed that the 
address "A2" of the load instruction "LD2" has been known in the second 
cycle 2, the address "A3" of the load instruction "LD3" has been known in 
the third cycle 3, and the address "A4" of the load instruction "LD4" has 
been known in the fourth cycle 4. 

10 In tne first Cycle 1, the load instruction "LD1" at the address 

"Al" is executed. In the second cycle 2, the load instruction "LD2" at the 
address "A2" is executed in non-program sequence because the address 
"A2" of the load instruction "LD2" has been known in the second cycle 2, 
whilst the address «A4"' of the store instruction "STl" has not been known 

15 in the second cycle 2. In the second cycle 2, the read after write 
dependence from the store instruction "ST1" to the load instruction "LD2" 
has not been known. The load instruction "LD2" is executed speculatively 
to the store instruction "STl". 

In the third cycle 3, the load instruction "LD3" at the address 

20 "A3" is executed in non-program sequence because the address "A3" of the 
load instruction "LD3" has been known in the third cycle 3, whilst the 
address "A4"' of the store instruction "STl" has not been known in the 
third cycle 3. In the third cycle 3, the read after write dependence from the 
store instruction "STl" to the load instruction "IJXV has not been known. 
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The load instruction "LD3" is executed speculatively to the store 
instruction "ST1". 

In the forth cycle 4, the address "A4"' of the store instruction 
"ST1" has not been known, whilst the address "A4" of the load instruction 
5 "LD4' 7 have become known. The load instruction "LD4" is executed. 

In the fifth cycle 5, the address "A4 5 " of the store instruction 
"ST1" has become known, and the store instruction "ST1" is executed. 
Concurrently, the read after write dependence from the stoic instruction 
"ST1" to respective one of the load instruction "LD2", the load instruction 

10 "LD3" and the load instruction "LD4" is judged. In this case, the address 
"A2" of the load instruction "LD2" and the address "A3" of the load 
instruction "LD3" are different from the address "A4 7 " of the store 
instruction "ST1", then the read after write dependence is not present. 

Since, however, the address "A4" of the load instruction "JLD4" 

15 is the same as the address "A4 J " of the store instruction "ST1", the read 
after write dependence from the store instruction "ST1" to the load 
instruction "LD4" is present. Even the read after write dependence from the 
store instruction "ST1" to the load instruction "LD4" is present, the non- 
program execution has been accomplished, for which reason the data 

20 dependence speculative execution of the load instruction "LD4" is judged 
to be the failure. 

In Order to ensure the correct result of the execution of the 
program, it is necessary to perform the recovery process for the failure of 
the data dependence speculative execution. 
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In the fifth cycle 5, the failure of the data dependence speculative 
execution is judged. The execution results of the load instruction "LD2" in 
the second cycle 2, the load instruction "LD3" in the third cycle 3, the load 
instruction "LD4" in the fourth cycle 4, the store instruction "ST1" in the 
5 fifth cycle 5 are canceled. Rc-cxccutions of the store instruction "ST1" in 
the seventh cycle 7, the load instruction "LD2" in the eighth cycle 8, the 
load instruction "LD3" in the ninth cycle 9, the load instruction i£ LD4" in 
the tenth cycle 10 are made as the recovery processes for the failure of the 
data dependence speculative execution. 

10 The executions of the five instructions, for example, the store 

instruction "ST1 the load instruction "LD2", the load instruction "LD3" 
and the load instruction "LD4" need ten cycles. The program sequence 
execution shown in FIG. IB needs the eight cycles. By contrast, the failure 
data dependence speculative execution in the non-program sequence shown 

15 in FIG. ID needs the ten cycles. The data dependence speculative 
execution in the non-program sequence deteriorates the performance by 
two cycle, provided that the data dependence speculative execution is 
failure. 

If, however, a probability of success in the data dependence 
20 speculative execution is sufficiently higher than a probability of failure in 
the data dependence speculative execution, then the effective performance 
of the processor for processing the program may be improved in total. 

For allowing the processor to perform the data dependence 
speculative execution, it is necessary to judge the presence of the read after 
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write dependence between the load/store instructions with reference to the 
memory. A data dependence detector has been known as detecting the 
presence of the read after write dependence between the load/store 
instructions. The conventional data dependence detector Is disclosed by 
5 Manoj Franklin et al, entitled "ARB : A Hardware Mechanism For 
Dynamic Reordering Of Memory References", IEEE Transactions On 
Computers, vol. 45, No. 5, May, 1996. 

FIG. 2 is a diagram illustrative of a conventional data 
dependence detector. The conventional data dependence detector 100 

10 includes address buffers 101, address comparators 102, and a logic-OR 
circuit 103. The address? buffers 101 store plural load addresses of the load 
instructions. The address comparators 102 are connected to the address 
buffers 101 for comparing the plural load addresses of the load instructions 
stored in the address buffers 101 and a store address of the store instruction 

15 which have just been executed. The logic-OR circuit 103 takes a logical- 
OR of all of the compared results from the address comparators 102 and 
outputs a data dependence detected result. 

The detection of the read after write dependence from the store 
instruction to the load instruction is realized by the following operations of 

20 the data dependence detector 100. If the load instruction is executed by the 
data dependence speculative execution, the address of the load instruction 
is stored into a free address buffer 101. Subsequently, a store instruction is 

executed. The address of the store instruction is inputted into all of the 
plural address comparators 102, so that the plural address comparators 102 
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compare the load addresses of the executed load addresses with the 
inputted store address of the store instruction just executed and outputs the 
compared results which arc transmitted to the logic-OR circuit 103. 

The logic-OR circuit 103 takes the logical-OR of all of the 
5 compared results from the plural address comparators 102, and outputs the 
data dependence detected result. If the store address of the store instruction 
docs not correspond to any of the load addresses of the load instructions 
stored in the address buffers 101, then it is judged that the read after write 
dependence from the store instruction to respective one of the load 

10 instructions is not present. The data dependence detected result indicates 
that the read after write dependence from the store instruction to respective 
one of the load instructions is not present. This means that the data 
dependence speculative execution has resulted in the success. Subsequent 
instructions will continuously be executed. 

15 lf the storc address of the store instruction does correspond to 

any one of the load addresses of the load instructions stored in the address 
buffers 101, then it is judged that the read after write dependence from the 
store instruction to respective one of the load instructions is present. The 
data dependence detected result indicates that the read after write 

20 dependence from the store instruction to respective one of the load 
instructions is present. This means that the data dependence speculative 
execution has resulted in the failure. The recovery process for the failure of 
the data dependence speculative execution will subsequently be 
accomplished. 
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r rhe above conventional data dependence detector 100 has the 
following two programs. 

The first problem is that the necessary hardware size is large 
because for ensuring the exactly correct execution result of the program in 
5 the data dependence speculative execution, it is necessary to detect in full 
all of the read after write dependence. 

The conventional data dependence detector 100 stores the load 
addresses of all the load instructions executed by the data dependence 
speculative execution into the address buffers 101 and then the address 
10 comparators 102 compare the load addresses with the store address of the 
store instruction. The load addresses of the load instructions executed by 
the data dependence speculative execution are stored into the address 
buffers 101. If no free space is present in the address buffers 101, then it is 
no longer possible to subject the load instructions to the data dependence 
15 speculative execution. 

In this case, the subsequent load/store instructions arc executed 
in the program sequence. The number of the load instructions which may 
be executed by the data dependence speculative execution is limited by 
both the number of the address buffers 101 and the number of the address 
20 comparators 102. In order to improve the performance of the data 
dependence speculative execution, a large number of the address buffers 
101 and a large number of the address comparators 102 arc needed, 
whereby the necessary hardware size is large. 

A second problem is that the speed of detecting the read after 
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write dependence is slow. In order to detect the read after write dependence, 
it is necessary to take not only a time for processing the address 
comparison by the address comparator 102 but also a time for logic 
operation of the outputs from the address comparators 102. This makes it 
5 possible to improve the high frequency performance of the processor. 

As the number of the address buffers 101 and the number of the 
address comparators 102 are increased, the number of the inputs into the 
logic-OR circuit 103 is also increased, whereby the above disadvantages 
become more remarkable. 

10 Accordingly, the conventional data dependence detector needs a 

large hardware size for improving the performance of the data dependence 
speculative execution. The large hardware size increases the necessary time 
for processing the detection of the read after write dependence, thereby 
making it difficult to improve the high speed performance of the processor. 

15 In the above circumstances, the development of a novel data 

dependence detector free from the above problems is desirable. 

SUMMARY OF THE INVENTION 

20 Accordingly, it is an object of the present invention to provide a 

novel data dependence detectoT free from the above problems. 

It is a further object of the present invention to provide a novel 
data dependence detector with a small hardware size which is, however, 
capable of the data dependence speculative execution for obtaining a 
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substantive improvement in effective performance of the processor, 
whereby the number of the load instructions which may be execuLed by the 
data dependence speculative execution is not limited by the hardware size 
of the data dependence detector. 
5 It is a still further object of the present invention to provide a 

novel processor including a data dependence detector free from the above 
problems. 

It is yet a further object of the present invention to provide a 
novel processor including a data dependence detector with a small 

10 hardware size which is, however, capable of the data dependence 
speculative execution for obtaining a substantive improvement in effective 
performance of the processor, whereby the number of the load instructions 
which may be executed by the data dependence speculative execution is not 
limited by the hardware size of the data dependence detector. 

15 't is a still further object of the present invention to provide a 

novel multi-processor system including a plurality of processors, each of 
which includes a data dependence detector free from the above problems. 

It is yet a further object of the present invention to provide a 
novel multi-processor system including a plurality of processors, each of 

20 which includes a data dependence detector with a small hardware size 
which is, however, capable of the data dependence speculative execution 
for obtaining a substantive improvement in effective performance of the 
processor, whereby the number of the load instructions which may be 
executed by the data dependence speculative execution is not limited by the 
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hardware size of the data dependence detector. 

The present invention provides a detector for detecting at least 
one kind of dependence in address between instructions executed by at 
least a processor, the detector being adopted to detect a possibility of 
5 presence of the at least one kind of dependence, wherein if the at least one 
kind of dependence is present in fact, then the detector detects a possibility 
of presence of the at least one kind of dependence, and if the at least one 
kind of dependence is not present in fact, then the detector is allowed to 
detect the at least one kind of dependence. 

10 The present invention also provides a detector provided in a self- 

processor included in a multiple processor system including the self- 
processor and at least a processor other than the self-processor, and the 
multiple processor system performing parallel processings in thread units 
of program, and the detector detecting at least one kind of dependence in 

15 address between an instruction included in a thread executed by the self- 
processor and an instruction included in a thread executed by the other 
processor, the detector being adopted to detect a possibility of presence of 
the at least one kind of dependence, wherein if the at least one kind of 
dependence is present in fact, then the detector detects a possibility of 

20 presence of the at least one kind of dependence, and if the at least one kind 
of dependence is not present in fact, then the detector is allowed to detect 
the at least one kind of dependence. 

The present invention also provides a detector provided hi a self- 
processor included in a multiple processor System including the self- 
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processor and at least a processor other than the self-processor, and the 
multiple processor system performing parallel processings in thread units 
of program, and the detector detecting at least one kind of dependence in 
address between an instruction included in a thread executed by the self- 
5 processor and an instruction included in a thread executed by the other 
processor, the detector being adopted to detect a possibility of presence of 
the at least one kind of dependence, as well as the detector detecting at least 
one kind of dependence in address between instructions included in a 
thread executed by the self-processor, wherein if the at least one kind of 

10 dependence is present in fact, then the detector detects a possibility of 
presence of the at least one kind of dependence, and if the at least one kind 
of dependence is not present in fact, then the detector is allowed to detect 
the at least one kind of dependence. 

The above and other objects, features and advantages of the 

15 present invention will be apparent from the following descriptions. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Preferred embodiments according to the present invention will be 
20 described in detail with reference to the accompanying drawings. 

FIG. 1A is a diagram illustrative of a program sequence of 
store/load instructions. 

FIG. IB is a diagram illustrative of executions of instructions in 
the program sequence of FIG. 1 A. 
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FIG. 1C is a diagram illustrative of one example of the data 
dependence speculative execution which has resulted in the success. 

FIG. ID is a diagram illustrative of one example of the data 
dependence speculative execution which has resulted in the failure. 
5 FIG. 2 is a diagram illustrative of a conventional data 

dependence detector. 

FIG. 3 is a schematic view of an internal configuration of a data 
processor including a data dependence detector in a first embodiment in 
accordance with the present invention. 

10 FIG - 4 is a timing chart showing operations of the data 

dependence detector. 

FIG. 5 is a block diagram illustrative of a structure of the data 
dependence detector, 

FIG. 6 A is a diagram illustrative of the instruction program 
1 5 sequence involved in operations of the data dependence detector. 

FIG. 6B is a diagram illustrative of the instruction execution 
sequence involved in operations of the data dependence detector. 

FIG. 6C is a diagram illustrative of the data dependence detective 
unit involved in operations of the data dependence detector with reference 
20 to the instruction execution sequence. 

FIG. 7A is a diagram illustrative of a first example of the hash 
function circuit in the first embodiment in accordance with the present 
invention. 

FIG. 7B Is a diagram illustrative of a first example of the hash 



Page 21 



'01 11/27 22:01 FAX 03 3404 5748 



UNIVERSAL PATENT BUREAU Foley Lardner 



Pf-2901/ncc/us/mh 

function circuit in the second embodiment in accordance with the present 
invention. 

FIG. 7C is a diagram illustrative of a first example of the hash 
function circuit in the third embodiment in accordance with the present 
5 invention. 

FIG. 8 is a diagram illustrative of a structure of a multiple 
processor system including a data dependence detector in a second 
embodiment in accordance with the present invention. 

FIG. 9A is a diagram illustrative of a program sequence of 
10 instruction. 

FIG. 9B is a diagram illustrative of the instruction execution 
sequences of the data dependence speculative execution in the thread 
parallel execution. 

FIG. 9C is a diagram illustrative of the data dependence detection 
15 operation by the data dependence detector in the processor in the 
instruction execution sequence shown in FIG. 9B. 

FIG. 10 is a diagram illustrative of a structure of a processor 
included in a multiple processor system in a third embodiment in 
accordance with the present invention. 

20 

DETAILED DESCRIPTION OF TIIE PREFERRED EMBODIMENTS 

A first aspect of the present invention is a detector for detecting 
at least one kind of dependents in address between instructions executed 
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by at least a processor, the detector being adopted to detect a possibility of 
presence of the at least one kind of dependence, wherein if the at least one 
kind of dependence is present in fact, then the detector detects a possibility 
of presence of the at least one kind of dependence, and if the at least one 
5 kind of dependence is not present in fact, then the detector is allowed to 
detect the at least one kind of dependence. 

It is possible that the instruction comprises a memory access 
instruction for access to a memory. 

It is possible that the detector includes t an execution history 

10 storing unit including a plurality of entry which stores an instruction 
execution information of whether the memory access instruction has been 
executed ; and an address converter for converting an address of the 
memory access instruction into an entry number of the entry of the 
execution history storing unit, so that the execution history storing unit 

15 stores the instruction execution information into the entry designated by the 
entry number. 

Tt is possible that the address converter is adopted to convert a 
same address of different memory access Instructions into a same entry 
number for allowing that the same address foT the different memory access 
20 instructions is stored in the same entry, whereby if the at least one kind of 
dependence is not present in fact, then the detector is adopted to detect the 
at least one kind of dependence. 

It is possible that the memory access instruction comprises either 

a load instruction or a store instruction. 
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It is possible that the one kind of the dependence is a read after 
write dependence, and if the load instruction has been executed, then the 
address converter is adopted to convert an address of the load instruction 
into an entry number, and the detector stores an instruction execution 
5 information, that the load instruction has been executed, into the entry 
designated by the entry number of the execution history storing unit, and if 
the store instruction has been executed, then the address converter is 
adopted to convert an address of the store instruction into an entry number,, 
and the detector reads an instruction execution information, whether or not 

10 the load instruction has been executed, out from the entry designated by the 
entry number of the execution history storing unit, whereby the detector 
detects a possibility of presence of the read after write dependence from the 
store instruction as executed into the load instruction as executed. 

It is possible that the one kind of the dependence is a write after 

15 read dependence, and if the store instruction has been executed, then the 
address converter is adopted to convert an address of the store instruction 
into an entry number, and the detector stores an instruction execution 
information, that the store instruction has been executed, into the entry 
designated by the entry number of the execution history storing unit, and if 

20 the load instruction has been executed, then the address converter is 
adopted to convert an address of the load instruction into an entry number, 
and the detector reads an instruction execution information, whether or not 
the store instruction has been executed, out from the entry designated by 
the entry number of die execution history storing unit, whereby the detector 
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detects a possibility of presence of the write after read dependence from the 
load instruction as executed into the store instruction as executed. 

It is possible that the one kind of the dependence is an write after 
write dependence, and if the store instruction has been executed, then the 
5 address converter is adopted to convert an address of the store instruction 
into an entry number, and the detector reads an instruction execution 
information, whether or not the store instruction has been executed, out 
from the entry designated by the entry number of the execution history 
storing unit, and also the detector stores an instruction execution 

10 information, that the store instruction has been executed, into the entry 
designated by the entry number of the execution history storing unit, 
whereby the detector detects a possibility of presence of the write afteT 
write dependence between the store instructions as executed. 

It is possible that the one kind of the dependence is a read after 

15 write dependence, a write after read dependence and an write after write 
dependence and if the load instruction has been executed, then the address 
converter is adopted to convert an address of the load instruction into an 
entry number, and the detector reads an instruction execution information, 
whether or not the store instruction has been executed, from the entry 

20 designated by the entry number of the execution history storing unit, and 
also the detector stores an instruction execution information, that the load 
instruction has been executed, into the entry designated by the entry 
number of the execution history storing unit, and if the store instruction has 
been executed, then the address converter is adopted to convert an address 
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of the store instruction into an entry number, and the detector reads 
instruction execution informations, whether or not the load instruction has 
been executed and whether or not the store instruction has been executed, 
out from the entry designated by the entry number of the execution history 
5 storing unit, and also the detector stores an instruction execution 
information, that the store instruction has been executed into the entry 
designated by the entry number of the execution history storing unit, 
whereby the detector detects a possibility of presence of the read after write 
dependence from the store instruction as executed into the load instruction 

10 as executed, and also detects a possibility of presence of the write after read 
dependence from the load instruction as executed into the store instruction 
as executed as well as detects a possibility of presence of the write after 
write dependence between the store instructions as executed. 

It is possible that the address Converter is adopted to select plural 

15 bits of the address inputted and output the plural bits as indicating the entry 
number. 

It is possible that the address converter is adopted to take an 
exclusive-OR of the selected plural bits and output a value of the exclusive- 
OR as indicating the entry number. 
2t) A second aspect of the present invention is a detector provided in 

a self-processor included in a multiple processor system including the self- 
processor and at least a processor other than the soli-processor, and the 
multiple processor system performing parallel processings in thread units 
of program, and the detector detecting at least one kind of dependence in 
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address between an instruction included in a thread executed by the self- 
processor and an instruction included in a thread executed by the other 
processor, the detector being adopted to detect a possibility of presence of 
the at least one kind of dependence, wherein if the at least one kind of 
5 dependence is present in fact, then the detector detects a possibility of 
presence of the at least one kind of dependence, and if the at least one kind 
of dependence is not present in fact, then the detector is allowed to detect 
the at least one kind of dependence* 

It is possible that the instruction comprises a memory access 

10 instruction for access to a memory. 

It is possible that the detector includes : an execution history 
storing unit including a plurality of entry which stores an instruction 
execution information of whether the memory access instruction has been 
executed ; and an address converter for converting an address of the 

15 memory access instruction into an entry number of the entry of the 
execution history storing unit, so that the execution history storing unit 
stores the instruction execution information into the entry designated by the 
entry number. 

It is possible that the address converter is adopted to convert a 
20 same address of different memory access instructions into a same entry 
number for allowing that the same address for the different memory access 
instructions is stored in the same entry, whereby if the at least one kind of 

dependence is nol present in fact, then the detector is adopted to detect the 
al least one kind of dependence. 



Page 27 



'01 11/27 22:04 FAX 03 3404 5748 



UNIVERSAL PATENT BUREAU 4 Foley Lardner 



Pf-2901/ncc/us/mh 

It is possible that only if the memory access instruction has been 
executed by other processor than the self-processor, and the other processor 
is to execute a thread which is prior in program sequence to the thread 
executed by the self-processor, then the address converter is adopted to 
5 convert an address of the memory access instruction into an entry number 
of the entry, and the detector reads an instruction execution information, 
whether or not the memory accesses instruction has been executed, from an 
entry designated by the entry number. 

It is possible that if the memory access instruction has been 

10 executed by other processor than the self-processor, then the address 
converter is adopted to convert an address of the memory access instruction 
into an entry number of the entry, and the detector reads an instruction 
execution information, whether or not the memory accesses instruction has 
been executed, from an entry designated by the entry number by referring 

15 only a value which has been read out by the memory access instruction 
which had been executed by other processor executing a thread which is 
prior in program sequence to the thread executed by the self-processor. 

It is possible that the memory access instruction comprises either 
a load instruction or a store instruction. 

20 It is possible diat the one kind of the dependence is a read after 

write dependence, and if the load instruction has been executed by the self- 
processor, then the address converter is adopted to convert an address of the 
load instruction into an entry number, and the detector stores an instruction 
execution information, that the load instruction has been executed by the 
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self-processor, into the entry designated by the entry number of the 
execution history storing unit, and if the store instruction has been executed 
by the other processor, then the address converter is adopted to convert an 
address of the store instruction into an entry number, and the detector reads 
5 an instruction execution information, whether or not the load instruction 
has been executed by the self-processor, out from the entry designated by 
the entry number of the execution history storing unit, whereby the detector 
detects a possibility of presence of the read after write dependence from the 
store instruction as executed by the other processor into the load instruction 

10 as executed by the self-processor. 

It is possible that the one kind of the dependence is a write after 
read dependence, and if the store instruction has been executed by the self- 
processor, then the address converter is adopted to convert an address of the 
store instruction into an entry number, and the detector stores an instruction 

15 execution information, that the store instruction has been executed by the 
self-processor, into the entry designated by the entry number of the 
execution history storing unit, and if the load instruction has been executed 
by the other processor, then the address converter is adopted to convert an 
address of the load instruction into an entry number, and the detector reads 

20 an instruction execution information, whether or not the store instruction 
has been executed by the seif-processor, out from the entry designated by 
the entry number of the execution history storing unit, whereby the detector 
detects a possibility of presence Of the write after read dependence from the 
load instruction as executed by the other processor into the store instruction 
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as executed by the self -processor. 

It is possible that the one kind of the dependence is an write after 
write dependence, and if the store instruction has been executed by the 
other processor, then the address converter is adopted to convert an address 
5 of the store instruction into an entry number, and the detector reads an 
instruction execution information, whether or not the store instruction has 
been executed by the self-processor, out from the entry designated by the 
entry number of the execution history storing unit, if the store instruction 
has been executed by the self-processor, then the address converter is 

1 0 adopted to convert an address of the store instruction into an entry number, 
and the detector stores an instruction execution information, that the store 
instruction has been executed by the self-processor, into the entry 
designated by the entry number of the execution history storing unit, 
whereby the detector detects a possibility of presence of the write after 

15 write dependence from the store instruction as executed by the other 
processor to the store instruction as executed by the self- process or. 

It is possible that the one kind of the dependence is a read after 
write dependence, a write after read dependence and an write after write 
dependence and if the load instruction has been executed by the other 

20 processor, then the address converter is adopted to convert an address of the 
load instruction into an entry number, and the detector reads an instruction 
execution information, whether or not the store instruction has been 
executed by the self-processor, from the entry designated by the entry 
number of the execution history storing unit, if the store instruction has 
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been executed by the other processor, then the address converter is adopted 
to convert an address of the store instruction into an entry number, and the 
detector reads an instruction execution information, whether or not the load 
instruction has been executed by the self-processor, from the entry 
5 designated by the entry number of the execution history storing unit, if the 
load instruction has been executed by the self-processor, then the address 
converter is adopted to convert an address of the load instruction into an 
entry number, and the detector stores an instruction execution information, 
that the load instruction has been executed by the self-processor, into the 

10 entry designated by the entry number of the execution history storing unit, 
if the store instruction has been executed by the self-processor, then the 
address converter is adopted to convert an address of the store instruction 
into an entry number, and the detector stores an instruction execution 
information, that the store instruction has been executed by the self- 

15 processor, into the entry designated by the entry number of the execution 
history storing unit, whereby the detector detects a possibility of presence 
of the read after write dependence from the store instruction as executed by 
the other processor into the load instruction as executed by the self- 
processor, and also detects a possibility of presence of the write after read 

20 dependence from the load instruction as executed by the other processor 
into the store instruction as executed by the self-processor as well as 
detects a possibility of presence of the write after write dependence from 
the store instruction as executed by the other processor into the store 
instruction as executed by the self-processor. 
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It is possible that the address converter is adopted to select pluial 
bits of the address inputted and output the plural bits as indicating the entry 
number. 

Tt is possible that the address converter is adopted to take an 
5 exclusive-OR of the selected plural bits and output a value of the exclusive- 
OR as indicating the entry number. 

A third aspect of the present invention is a detector provided in a 
self-processor included in a multiple processor system including the self- 
processor and at least a processor other than the self-processor, and the 

10 multiple processor system performing parallel processings in thread units 
of program, and the detector detecting at least one kind of dependence in 
address between an instruction included in a thread executed by the self- 
processor and an instruction included in a thread executed by the other 
processor, the detector being adopted to detect a possibility of presence of 

15 the at least one kind of dependence, as well as the detector detecting at least 
one kind of dependence in address between instructions included in a 
thread executed by the self-processor, wherein if the at least one kind of 
dependence is present in fact, then the detector detects a possibility of 
presence of the at least one kind of dependence, and if the at least one kind 

20 of dependence is not present in fact, then the detector is allowed to detect 
the at least one kind of dependence. 

It is possible that the instruction comprises a memory access 
instruction for access to a memory. 

It is possible that the detector includes : an execution history 
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storing unit including a plurality of entry which stores an instruction 
execution information of whether the memory access instruction has been 
executed ; and an address converter for converting an address of the 
memory access instruction into an entry number of the entry of the 
5 execution history storing unit, so that the execution history storing unit 
stores the instruction execution information into the entry designated by the 
entry number. 

It is possible that the address converter is adopted to convert a 
same address of different memory access instructions into a same entry 

10 number for allowing that the same address for the different memory access 
instructions is stored in the same entry, whereby if the at least one kind of 
dependence is not present in fact, then the detector is adopted to detect the 
at least one kind of dependence. 

It is possible that only if the memory access instruction has been 

1 5 executed by other processor than the self-processor, and the other processor 
is to execute a thread which is prior in program sequence to the thread 
executed by the self-processor, then the address converter is adopted to 
convert an address of the memory access instruction into an entry number 
of the entry, and the detector reads an instruction execution information, 

20 whether or not the memory accesses instruction has been executed, from an 
entry designated by the entry number, 

it is possible that if the memory access instruction has been 
executed by other processor than the self-processor, then the address 
converter is adopted to convert an address of the memory access instruction 
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into an entry number of the entry, and the detector reads an instruction 
execution information, whether or not the memory accesses instruction has 
been executed, from an entry designated by the entry number by referring 
only a value which has been read out by the memory access instruction 
5 which had been executed by other processor executing a thread which is 
prior in program sequence to the thread executed by the self-processor. 

It is possible that the memory access instruction comprises cither 
a load instruction or a store instruction. 

It is possible that the one kind of the dependence is a read after 

1 0 write dependence, and if the load instruction has been executed by the self- 
processor, then the address converter is adopted to convert an address of the 
load instruction into an entry number, and the detector stores an instruction 
execution information, that the load instruction has been executed by the 
Self-processor, into the entry designated by the entry number of the 

15 execution history storing unit, and if the store instruction has been executed 
by either the self-processor or the other processor, then the address 
converter is adopted to convert an address of the store instruction into an 
entry number, and the detector reads an instruction execution information, 
whether or not the load instruction has been executed by the self -processor, 

20 out from the entry designated by the entry number of the execution history 
storing unit, whereby the detector detects a possibility of presence of the 
read after write dependence from the store instruction as executed by either 
the self-processor or the other processor into the load instruction as 
executed by the self-processor. 
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It is possible that the one kind of the dependence is a write after 
read dependence, and if the store instruction has been executed by the self- 
processor, then the address converter is adopted to convert an address of the 
store instruction into an entry number, and the detector stores an instruction 
5 execution information, that the store instruction has been executed by the 
self-processor, into the entry designated by the entry number of the 
execution history storing unit, and if the load instruction has been executed 
by cither the self-processor or the other processor, then the address 
converter is adopted to convert an address of the load instruction into an 

10 entry number, and the detector reads an instruction execution information, 
whether or not the store instruction has been executed by the self-processor, 
out from the entry designated by the entry number of the execution history 
storing unit, whereby the detector detects a possibility of presence of the 
write after read dependence from the load instruction as executed by either 

15 the self-processor or the other processor into the store instruction as 
executed by the self-processor. 

It is possible that the one kind of the dependence is an write after 
write dependence, and if the store instruction has been executed by the 
other processor, then the address converter is adopted to convert an address 

20 of the store instruction into an entry number, and the detector reads an 
instruction execution information, whether or not the store instruction has 
been executed by the self-processor, out from the entry designated by the 
entry number of the execution history storing unit, if the store instruction 
has been executed by the self-processor, then the address converter is 
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adopted to convert an address of the store instruction into an entry number, 
and the detector reads an instruction execution information, whether or not 
the store instruction has been executed by the self -processor, out from the 
entry designated by the entry number of the execution history storing unit, 
5 as well as the detector stores an instruction execution information, that the 
store instruction has been executed by the self-processor, into the entry 
designated by the entry number of the execution history storing unit, 
whereby the detector detects a possibility of presence of the write after 
write dependence from the store instruction as executed by cither the self- 
10 processor or the other processor to the store instruction as executed by the 
self-processor. 

It is possible that the one kind of the dependence is a read after 
write dependence, a write after read dependence and an write after write 
dependence and if the load instruction has been executed by either the self- 

15 processor or the other processor, then the address converter is adopted to 
convert an address of the load instruction into an entry number, and the 
detector reads an instruction execution information, whether or not the 
store instruction has been executed by the self-processor, from the entry 
designated by the entry number of the execution history storing unit, if the 

20 store instruction has been executed by either the self-processor or the other 
processor, then the address converter is adopted to convert an address of the 
store instruction into an entry number, and the detector reads an instruction 
execution information, whether or not the load instruction has been 
executed by the self-processor, from the entry designated by the entry 
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number of the execution history storing unit, if the load instruction has 
been executed by the self-processor, then the address converter is adopted 
to convert an address of the load instruction into an entry number, and the 
detector stores an instruction execution information, that the load 
5 instruction has been executed by the self-processor, into the entry 
designated by the entry number of the execution history storing unit, if the 
store instruction has been executed by the self-processor, then the address 
converter is adopted to convert an address of the store instruction into an 
entry number, and the detector stores an instruction execution information, 

10 that the store instruction has been executed by the self-processor, into the 
entry designated by the entry number of the execution history storing unit, 
whereby the detector detects a possibility of presence of the read after write 
dependence from the store instruction as executed by either the self- 
processor or the other processor into the load instruction as executed by the 

15 self-processor, and also detects a possibility of presence of the write after 
read dependence from the load instruction as executed by cither the self- 
processor or the other processor into the store instruction as executed by 
the self-processor as well as detects a possibility of presence of the write 
after write dependence from the store instruction as executed by either the 

20 self-processor or the other processor into the store instruction as executed 
by the self -processor. 

Tt is possible that the address converter is adopted to select plural 
bits of the address inputted and output the plural bits as indicating the entry 
number. 
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It is possible that the address converter is adopted to take an 
exclusive-OR of the selected plural bits and output a value of the exclusive- 
OR as indicating the entry number. 

A fourth aspect of the present invention is a processor including ; 
5 an instruction execution unit ; a processor control unit ; and a data 
dependence detector, wherein for allowing the processor to execute the 
instructions, the processor control unit supplies a kind of the instruction to 
the instruction execution unit and the data dependence detector, and if the 
instruction to be executed is a memory access instruction which comprises 

10 either a load instruction for transferring data from a memory to a register or 
a store instruction for transferring data from the register to the memory, 
then the processor control unit supplies an address of the memory access 
instruction to the instruction execution unit and the data dependence 
detector as well as the processor control unit supplies the data dependence 

15 detector with a speculative execution flag which indicates that the memory 
access instruction is to be executed by a data dependence speculative 
execution, and the data dependence detector further includes : an execution 
history storing unit including a plurality of entry which stores an instruction 
execution information of whether the memory access instruction has been 

20 executed ; and an address converter for converting an address of the 
memory access instruction into an entry number of the entry of the 
execution history storing unit, so that the execution history storing unit 
refers the speculative execution flag and an execution instruction from the 
processor control unit, and allows read/write operation of the instruction 
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execution information to the entry designated by the entry number, and the 
data dependence detector supplies the processor control unit with a value 
which indicates whether the data dependence execution has been in success 
or failure, 

5 it is possible that the address converter is adopted to convert a 

same address of different memory access instructions into a same entry 
number for allowing that the same address for the different memory access 
instructions is stored in the same entry, whereby if the at least one kind of 
dependence is not present in fact, then the detector is adopted to detect the 
10 at least one kind of dependence. 

It is possible that the address converter is adopted to select plural 
bits of the address inputted and output the plural bits as indicating the entry 
number. 

It is possible that the address converter is adopted to take an 
15 exclusive-OR of the selected plural bits and output a value of the exclusive- 
OR as indicating the entry number. 

It is possible that in case that the processor is in a state of a 
speculative instruction execution to the data dependence, and if the 
processor has speculatively executed the load instruction, then the address 
20 converter converts the address of the load instruction into an entry number 
which is inputted into the instruction history storing unit, and the 
instruction history storing unit stores a value, which indicates that the load 
instruction has speculatively executed, into an entry designated by the entry 
number, and if the processor has speculatively executed the store 
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instruction, then the address converter converts the address of the store 
instruction into an entry number which is inputted into the instruction 
history storing uniL, and the instruction history storing unit reads a stored 
value out from an entry designated by the entry number, whereby the data 
5 dependence detector detects, based on the read out value, a possibility of 
presence of a read after write dependence from the store instruction from 
the load instruction, and if the possibility of presence of the read after write 
dependence is detected, the data dependence detector supplies the 
processor control unit with a value which indicates that the data 
1 0 dependence speculative execution has been in failure. 

It is possible that the data dependence detector initializes all of 
the entries of the instruction history storing unit based on the speculative 
execution flag. 

It is possible that if the data dependence detector supplies the 
15 processor control unit with a value which indicates that the data 
dependence speculative execution has been in failure, the processor control 
unit and the instruction execution unit perform a recovery process for 
failure of the data dependence speculative execution. 

It is possible that if no presence of the read after write 
20 dependence from the store instruction to the load instruction is detected, 
then the data dependence detector supplies the processor control unit with a 
value which indicates that the data dependence speculative execution has 
been in success, the processor control unit continues subsequent instruction 
executions without execution of a recovery process. 
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A fifth aspect of the present invention is a multiple processor 
system for thread parallel processings, the system including : a plurality of 
processor which farther includes an instruction execution unit ; a processor 
control unit ; and a data dependence detector ; and a thread control unit 
5 being adopted to allocate threads to the plurality of processor for allowing 
each of the plurality of processor to execute at least an allocated thread, and 
the thread control unit being also adopted to supply each of the plurality of 
processor with both a respective speculative execution flag which indicate 
whether or not the thread should be executed by a data dependence 

10 speculative execution, and a thread sequence which indicates execution 
sequences of the threads, whereby each of the processors receives the 
speculative execution flag and the thread sequences and executes the at 
least allocated thread allocated by the thread control unit, wherein the 
processor control unit supplies a kind of an instruction to be executed by a 

15 self-processor, in which the processor control unit is provided, to the 
instruction execution unit and the data dependence detector which are 
provided in the self-processor, and as well as to at least a processor other 
than the self-processor, the data dependence detector receives the kind of 
the instruction to be executed by the self-processor, and also receives kinds 

20 of instructions to be executed by the other processor from processor control 
units provided in the other processor ; if the instruction to be executed is 
cither a load instruction or a store instruction, the processor control unit 
supplies an address of the instruction to the instruction execution unit and 
the data dependence detector which arc provided in the self-processor, and 
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as well as to the other processor ; the data dependence detector includes the 
same number of address converter as a total number of the processors 
included in the system, an instruction history storing unit and a logic 
circuit ; the data dependence detector judges whether the self-processor is 
5 in a definitive execution state or a speculative execution state based on the 
speculative execution flag supplied from the thread control unit ; the data 
dependence detector judges, whether the thread executed by the self- 
processor is prior in program sequence to a respective thread executed by 
the other processor, based on the thread sequences, and the data 

10 dependence detector supplies a data dependence detected result to the 
thread control unit ; one of the address converters converts an address of a 
load instruction to be executed by the self-processor into an entry number 
of the instruction history storing unit, and remaining of the address 
converters converts an address of a store instruction to be executed by the 

15 other processor into an entry number of the instruction history storing unit ; 
the instruction history storing unit comprises a plurality of entry, a single 
write port and plural read ports which number is smaller by one than the 
total number of the processors ; the address converter, which has an input 
port for receiving the address of the instruction to be executed by the self- 

20 processor, has an output port connected to the write port of the instruction 
history storing unit for write operation into an entry designated by the entry 
number of the instruction history storing unit ; the address converter, which 
has an input port for receiving the address of the instruction to be executed 
by the other processor, has an output port connected to the read port of the 
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instruction history storing unit for read operation from an entry designated 
by the entry number of the instruction history storing unit ; and the logic 
circuit operates a logical-OR of the plural read ports of the address 
converter and supplies an operated logic value as a data dependence 
5 detected result to the thread control unit. 

It is possible that the data dependence detector initializes all of 
the entries of the instruction history storing unit based on the speculative 
execution flag. 

It is possible that if the self-processor is in the speculative 
10 execution state and the load instruction has speculatively been executed by 
the self-processor, then the address converter in the data dependence 
detector converts an address of the load instruction into an entry number of 
the instruction history storing unit and supplies the entry number to the 
write port of the instruction history storing unit, so that the instruction 
15 history storing unit sets a value indicating, that the load instruction has 
speculatively been executed, at an entry designated by the entry number, if 
the self-processor is in the speculative execution state and the store 
instruction has been executed by other processor which is allocated with a 
prior thread which is prior in program sequence to a thread executed by the 
20 self-processor, then the address converter in the data dependence detector 
converts an address of the store instruction into an entry number of the 
instruction history storing unit and supplies the entry number to 
corresponding one of the input ports of the instruction history storing unit, 
so that the instruction history storing unit reads a content from an entry 
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designated by the entry number and supplies the content to the logic circuit, 
the logic circuit operates a logical-OR of the plural read ports of the 
address converter and supplies an operated logic value as a data 
dependence detected result to the thread control unit. 
5 It is possible that if an address of a store instruction executed by 

other processor allocated with a prior thread which is prior in program 
sequence to a thread executed by the self-processor is either equal to an 
address of a load instruction subjected to a data dependence speculative 
execution by the self-processor or allocated to the same entry of the 

10 instruction history storing unit due to appearance of areas, then a value 
indicating, that the load instruction subjected to the data dependence 
speculative execution is present, is read out from the instruction history 
storing unit, a possibility of presence of a read after write dependence is 
detected, and the data dependence detector supplies a value indicating 

15 failure of the data dependence speculative execution to the thread control 
unit, the thread control unit receives a notice of failure of the data 
dependence speculative execution from either one of the processors, and 
the thread control unit sends a request for recovery process against failure 
of the data dependence speculative execution to a processor which has 

20 supplied the notice of failure of the data dependence speculative execution 
as well as to a processor executing a thread which is post in program 
sequence to a thread executed by the processor which has supplied the 
notice, and the processor control unit of the processor receives the request 
for the recovery process to failure of the data dependence speculative 



Page 44 



01 11/27 22:15 FAX 03 3404 5748 UNIVERSAL PATENT BUREAU * Foley Lardner 1049 



Pf-2901/ncc/us/mh 
execution. 

A sixth aspect of the present invention is a semiconductor 
integrated circuit including a detector for detecting at least one kind of 
dependence in address between instructions executed by at least a 

5 processor, the detector being adopted to detect a possibility of presence of 
the at least one kind of dependence, wherein if the at least one kind of 
dependence is present in fact, then the detector detects a possibility of 
presence of the at least one kind of dependence, and if the at least one kind 
of dependence is not present in fact, then the detector ia allowed to detect 

10 the at least one kind of dependence. 

It is possible that the instruction comprises a memory access 
instruction for access to a memory. 

It is possible that the detector includes : an execution history 
storing unit including a plurality of entry which stores an instruction 

15 execution information of whether the memory access instruction has been 
executed ; and an address converter for converting an address of the 
memory access instruction into an entry number of the entry of the 
execution history storing unit, so that the execution history storing unit 
stores the instruction execution information into the entry designated by the 

20 entry number. 

It is possible that the address converter is adopted to convert a 
same address of different memory access instructions into a same entry 
number for allowing that the same address for the different memory access 
instructions is stored in the same entry, whereby if the at least one kind of 
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dependence is not present in fact, then the detector is adopted to detect the 
at least one kind of dependence. 

It is possible that the memory access instruction comprises cither 
a load instruction or a store instruction. 
5 It is possible that the one kind of the dependence is a read after 

write dependence, and if the load instruction has been executed, then the 
address converter is adopted to convert an address of the load instruction 
into an entry number, and the detector stores an instruction execution 
information, that the load instruction has been executed, into the entry 

10 designated by the entry number of the execution history storing unit, and if 
the store instruction has been executed, then the address converter is 
adopted to convert an address of the store instruction into an entry number, 
and the detector reads an instruction execution Information, whether or not 
the load instruction has been executed, out from the entry designated by the 

15 entry number of the execution history storing unit, whereby the detector 
detects a possibility of presence of the read after write dependence from the 
store instruction as executed into the load instruction as executed. 

It is possible that the one kind of the dependence is a write after 
read dependence, and if the store instruction has been executed, then the 

20 address converter is adopted to convert an address of the store instruction 
into an entry number, and the detector stores an instruction execution 
information, that the store instruction has been executed, into the entry 
designated by the entry number of the execution history storing unit, and if 
the load instruction has been executed, then the address converter is 
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adopted to convert an address of the load instruction into an entry number, 
and the detector reads an instruction execution information, whether or not 
the store instruction has been executed, out from the entry designated by 
the entry number of the execution history storing unit, whereby the detector 
5 detects a possibility of presence of the write after read dependence from the 
load instruction as executed into the store instruction as executed. 

It is possible that the one kind of the dependence is an write after 
write dependence, and if the store instruction has been executed, then the 
address converter is adopted to convert an address of the store instruction 

10 into an entry number, and the detector reads an instruction execution 
information, whether oi not the store instruction has been executed, out 
from the entry designated by the entry number of the execution history 
storing unit, and also the detector stores an instruction execution 
information, that the store instruction has been executed, into the entry 

15 designated by the entry number of the execution history storing unit, 
whereby the detector detects a possibility of presence of the write after 
write dependence between the store instructions as executed. 

It is possible that the one kind of the dependence is a read after 
write dependence, a write after read dependence and an write after write 

20 dependence and if the load instruction has been executed, then the address 
converter is adopted to convert an address of the load instruction into an 
entry number, and the detector reads an instruction execution information, 
whether or not the store instruction has been executed, from the entry 
designated by the entry number of the execution history storing unit, and 
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also the detector stores an instruction execution information, that the load 
instruction has been executed, into the entry designated by the entry 
number of the execution history storing unit, and if the store instruction has 
been executed, then the address converter is adopted to convert an address 
5 of the Store instruction into an entry number, and the detector reads 
instruction execution informations, whether or not the load instruction has 
been executed and whether or not the store instruction has been executed, 
out from the entry designated by the entry number of the execution history 
storing unit, and also the detector stores an instruction execution 

10 information, that the store instruction has been executed into the entry 
designated by the entry number of the execution history storing unit, 
whereby the detector detects a possibility of presence of the read after write 
dependence from the store instruction as executed into the load instruction 
as executed, and also detects a possibility of presence of the write after read 

15 dependence from the load instruction as executed into the store instruction 
as executed as well as detects a possibility of presence of the write after 
write dependence between the store instructions as executed. 

It is possible that the address converter is adopted to select plural 
bits of the address inputted and output the plural bits as indicating the entry 

20 number. 

It is possible that the address converter is adopted to take an 
exciusive-OR of the selected plural bits and output a value of the exclusive- 
OR as indicating the entry number. 

A seventh aspect of the present invention is a semiconductor 
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integrated circuit including a detector provided in a self-processor included 
in a multiple processor system including the self-processor and at least a 
processor other than the self-processor, and the multiple processor system 
performing parallel processings in thread units of program, and the detector 
5 detecting at least one kind of dependence m address between an instruction 
included in a thread executed by the self-processor and an instruction 
included in a thread executed by the other processor, the detector being 
adopted to detect a possibility of presence of the at least one kind of 
dependence, wherein if the at least one kind of dependence is present in 
10 fact, then the detector detects a possibility of presence of the at least one 
kind of dependence, and if the at least one kind of dependence is not 
present in fact, then the detector is allowed to detect the at least one kind of 
dependence. 

It is possible that the instruction comprises a memory access 

15 instruction for access to a memory. 

It is possible that the detector includes : an execution history 
storing unit including a plurality of entry which stores an instruction 
execution information of whether the memory access instruction has been 
executed ; and an address converter for converting an address of the 

20 memory access instruction into an entry number of the entry of the 
execution history storing unit, so that the execution history storing unit 
stores the instruction execution information into the entry designated by the 
entry number. 

It is possible that the address converter is adopted to convert a 
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same address of different memory access instructions into a same entry 
number for allowing that the same address for the different memory access 
instructions is stored in the same entry, whereby if the at least one kind of 
dependence is not present in fact, then the detector is adopted to detect the 
5 at least one kind of dependence. 

It is possible that only if the memory access instruction has been 
executed by other processor than the self-processor, and the other processor 
is to execute a thread which is prior in program sequence to the thread 
executed by the self-processor, then the address converter is adopted to 

10 convert an address of the memory access instruction into an entry number 
of the entry, and the detector reads an instruction execution information, 
whether or not the memory accesses instruction has been executed, from an 
entry designated by the entry number. 

It is possible that if the memory access instruction has been 

15 executed by other processor than the self-processor, then the address 
converter is adopted to convert an address of the memory access instruction 
into an entry number of the entry, and the detector reads an instruction 
execution information, whether or not the memory accesses instruction has 
been executed, from an entry designated by the entry number by referring 

20 only a value which has been read out by the memory access instruction 
which had been executed by other processor executing a thread which is 
prior in program sequence to the thread executed by the self-processor. 

It is possible that the memory access instruction comprises either 
a load instruction or a store instruction. 
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It is possible that the one kind of the dependence is a read after 
write dependence, and if the load instruction has been executed by the self- 
processor, then the address converter is adopted to convert an address of the 
load instruction into an entry number, and the detector stores an instruction 
5 execution information, that the load instruction has been executed by the 
self-processor, into the entry designated by the entry number of the 
execution history storing unit, and if the store instruction has been executed 
by the other processor, then the address converter is adopted to convert an 
address of the store instruction into an entry number, and the detector reads 

10 an instruction execution information, whether 01 not the load instruction 
has been executed by the self-processor, out from the entry designated by 
the entry number of the execution history storing unit, whereby the detector 
detects a possibility of presence of the read after write dependence from the 
store instruction as executed by the other processor into the load instruction 

15 as executed by the self-processor. 

It is possible that the one kind of the dependence is a write after 
read dependence, and if the store instruction has been executed by the self- 
processor, then the address converter is adopted to convert an address of the 
store instruction into an entry number, and the detector stores an instruction 

20 execution information, that the store instruction has been executed by the 
self-pTOccssor, into the entry designated by the entry number of the 
execution history storing unit, and if the load instruction has been executed 
by the other processor, then the address converter is adopted to convert an 
address of the load instruction into an entry number, and the detector reads 
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an instruction execution information, whether or not the store instruction 
has been executed by the self-processor, out from the entry designated by 
the entry number of the execution history storing unit, whereby the detector 
detects a possibility of presence of the write after read dependence from the 
5 load instruction as executed by the other processor into the store instruction 
as executed by the self-processor. 

It is possible that the one kind of the dependence is an write after 
write dependence, and if the store instruction has been executed by the 
other processor, then the address converter is adopted to convert an address 

10 of the store instruction into an entry number, and the detector reads an 
instruction execution information, whether or not the store instruction has 
been executed by the self-processor, out from the entry designated by the 
entry number of the execution history storing unit, if the store instruction 
has been executed by the self-processor, then the address converter is 

15 adopted to convert an address of the store instruction into an entry number, 
and the detector stores an instruction execution information, that the store 
instruction has been executed by the self-processor, into the entry 
designated by the entry number of the execution history storing unit, 
whereby the detector detects a possibility of presence of the write after 

20 write dependence from the store instruction as executed by the other 
processor to the store instruction as executed by the self-processor. 

It is possible that the one kind of the dependence is a read after 
write dependence, a write after read dependence and an write after write 
dependence and if the load instruction has been executed by the other 
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processor, then the address converter is adopted to convert an address of the 
load instruction into an entry number, and the detector reads an instruction 
execution information, whether or not the store instruction has been 
executed by the self-processor, from the entry designated by the entry 
5 number of the execution history storing unit, if the store instruction has 
been executed by the other processor, then the address converter is adopted 
to convert an address of the store instruction into an entry number, and the 
detector reads an instruction execution information, whether or not the load 
instruction has been executed by the self -processor, from the entry 

10 designated by the entry number of the execution history storing unit, if the 
load instruction has been executed by the self-processor, then the address 
converter is adopted to convert an address of the load instruction into an 
entry number, and the detector stores an instruction execution information, 
that the load instruction has been executed by the self-processor, into the 

15 entry designated by the entry number of the execution history storing unit, 
if the store instruction has been executed by the self-processor, then the 
address converter is adopted to convert an address of the store instruction 
into an entry number, and the detector stores an instruction execution 
information, that the store instruction has been executed by the se!f- 

20 processor, into the entry designated by the entry number of the execution 
history storing unit, whereby the detector detects a possibility of presence 
of the read after write dependence from the store instruction as executed by 
the other processor into the load instruction as executed by the self- 
processor, and also detects a possibility of presence of the write after read 
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dependence from the load instruction as executed by the other processor 
into the store instruction as executed by the self-processor as well as 
detects a possibility of presence of the write after write dependence from 
the store instruction as executed by the other processor into the store 
5 instruction as executed by the self-processor. 

It is possible that the address converter is adopted to select plural 
bits of the address inputted and output the plural bits as indicating the entry 
number. 

It is possible that the address converter is adopted to take an 
10 exclusive-OR of the selected plural bits and output a value of the exclusive- 
OR as indicating the entry number. 

An eighth aspect of the present invention is a semiconductor 
integrated circuit including a detector provided in a self-processor included 
in a multiple processor system including the self-processor and at least a 
1 5 processor other than the self -processor, and the multiple processor system 
performing parallel processings in thread units of program, and the detector 
detecting at least one kind of dependence in address between an instruction 
included in a thread executed by the self-processor and an instruction 
included in a thread executed by the other processor, the detector being 
20 adopted to detect a possibility of presence of the at least one kind of 
dependence, as well as the detector delecting at least one kind of 
dependence in address between instructions included in a thread executed 
by the self-processor, wherein if the at least one kind of dependence is 
present in fact, then the detector detects a possibility of presence of the at 
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least one kind of dependence, and if the at least one kind of dependence is 
not present in fact, then the detector is allowed to detect the at least one 
kind of dependence. 

It is possible that the instruction comprises a memory access 
5 instruction for access to a memory. 

It is possible that the detector includes : an execution history 
storing unit including a plurality of entry which stores an instruction 
execution information of whether the memory access instruction has been 
executed ; and an address converter for converting an address of the 
10 memory access instruction into an entry number of the entry of the 
execution history storing unit, so that the execution histoTy storing unit 
Stores the instruction execution information into the entry designated by the 
entry number. 

It is possible that the address converter is adopted to convert a 
15 same address of different memory access instructions into a same entry 
number for allowing that the same address for the different memory access 
instructions is stored in the same entry, whereby if the at least one kind of 
dependence is not present in fact, then the detector is adopted to detect the 
at least one kind of dependence. 
20 11 is possible that only if the memory access instruction has been 

executed by other processor than the self-processor, and the other processor 
is to execute a thread which is prior in program sequence to the thread 
executed by the self-processor, then the address converter is adopted to 
convert an address of the memory access instruction into an entry number 
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of the entry, and the detector reads an instruction execution information, 
whether or not the memory accesses instruction has been executed, from an 
entry designated by the entry number. 

It is possible that if the memory access instruction has been 
5 executed by other processor than the self-processor, then the address 
converter is adopted to convert an address of the memory access instruction 
into an entry number of the entry, and the detector reads an instruction 
execution information, whether or not the memory accesses instruction has 
been executed, from an entry designated by the entry number by referring 

10 only a value which has been read out by the memory access instruction 
which had been executed by other processor executing a thread which is 
prior in program sequence to the thread executed by the self-processor. 

It is possible that the memory access instruction comprises either 
a load instruction or a store instruction. 

15 It is possible that the one kind of the dependence is a read after 

write dependence, and if the load instruction has been executed by the self- 
processor, then the address converter is adopted to convert an address of the 
load instruction into an entry number, and the detector stores an instruction 
execution information, that the load instruction has been executed by the 

20 self-processor, into the entry designated by the entry number of the 
execution history storing unit, and if the store instruction has been executed 
by either the self-processor or the other processor, then the address 
converter is adopted to convert an address of the store instruction into an 
entry number, and the detector reads an instruction execution information, 
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whether or not the load instruction has been executed by the self-processor, 
out from the entry designated by the entry number of the execution history 
storing unit, whereby the detector detects a possibility of presence of the 
read after write dependence from the store instruction as executed by either 
5 the self-processor or the other processor into the load instruction as 
executed by the self-processor. 

It is possible that the one kind of the dependence is a write after 
read dependence, and if the store instruction has been executed by the self- 
processor, then the address converter is adopted to convert an address of the 

10 store instruction into an entry number, and the detector stores an instruction 
execution information, that the store instruction has been executed by the 
self-processor, into the entry designated by the entry number of the 
execution history storing unit, and if the load instruction has been executed 
by either the self-processor or the other processor, then the address 

15 converter is adopted to convert an address of the load instruction into an 
entry number, and the detector reads an instruction execution information, 
whether or not the store instruction has been executed by the self-processor, 
out from the entry designated by the entry number of the execution history 
storing unit, whereby the detector detects a possibility of presence of the 

20 write after read dependence from the load instruction as executed by cither 
the self-processor or the other processor into the store instruction as 
executed by the sell-processor. 

It is possible that the one kind of the dependence is an write after 
write dependence, and if the store instruction has been executed by the 
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other processor, then the address converter is adopted to convert an address 
of the store instruction into an entry number, and the detector reads an 
instruction execution information, whether or not the store instruction has 
been executed by the self-processor, out from the entry designated by the 

5 entry number of the execution history storing unit, if the store instruction 
has been executed by the sclf-processoT, then the address converter is 
adopted to convert an address of the store instruction into an entry number, 
and the detector reads an instruction execution information, whether or not 
the store instruction has been executed by the self-processor, out from the 

10 entry designated by the entry number of the execution history storing unit, 
as well as the detector stores an instruction execution information, that the 
store instruction has been executed by the self-processor, into the entry 
designated by the entry number of the execution history storing unit, 
whereby the detector detects a possibility of presence of the write after 

15 write dependence from the store instruction as executed by either the self- 
processor or the other processor to the store instruction as executed by the 
self-processor. 

It is possible that the one kind of the dependence is a read after 
write dependence, a write after read dependence and an write after write 
20 dependence and if the load instruction has been executed by either the self- 
processor or the other processor, then the address converter is adopted to 
convert an address of the load instruction into an entry number, and the 
detector reads an instruction execution information, whether or not the 
store instruction has been executed by the self-processor, from the entry 
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designated by the entry number of the execution history storing unit, if the 
store instruction has been executed ,'by either the self-processor or the other 
processor, then the address convert lr is adopted to convert an address of the 
store instruction into an entry number and the detector reads an instruction 
5 execution information, whether or not the load instruction has been 
executed by the self-processor, from the entry designated by the entry 
number of the execution history storing unit, if the load instruction has 
been executed by the self-processor, then the address converter is adopted 
to convert an address of the load instruction into an entry number, and the 

10 detector stores an instruction execution information, that the load 
instruction has been executed by the self-processor, into the entry 
designated by the entry number of the execution history storing unit, if the 
store instruction has been executed by the self-processor, then the address 
converter is adopted to convert an address of the store instruction into au 

15 entry number, and the detector stores an instruction execution information, 
that the store instruction has been executed by the self-processor, into the 
entry designated by the entry number of the execution history storing unit, 
whereby the detector detects a possibility of presence of the read after write 
dependence from the store instruction as executed by either the self- 

20 processor or the other processor into the load instruction as executed by the 
self-processor, and also detects a possibility of presence of the write after 
read dependence from the load ins jt ruction as executed by either the self- 
processor or the other processor iiko the store instruction as executed by 
the self-processor as well as dctecfe a possibility of presence of the write 
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10 



after write dependence from the store instruction as executed by either the 
self-processor or the other processor into the store instruction as executed 
by the self-processor. 

It is possible that the address converter is adopted to select plural 
bits of the address inputted and output the plural bits as indicating the entry 
number. 

It is possible that the address converter is adopted to take an 
exclusive-OR of the selected plural bits and output a value of the cxclusive- 
OR as indicating the entry number. 



FIRST EMEQDlMHKr - 

A first embodiment according to the present invention will be 
described in detail with reference to the drawings. FIG. 3 is a schematic 
view of an internal configuration of a data processor including a data 
15 dependence detector in a first embodiment in accordance with the present 
Invention. The data processor includes a processor control unit 17, an 
instruction execution unit 18 and a data dependence detector 10 and other 
units usually provided in the data processor but not relevant to the present 
invention. 

20 ln order to ca usc the processor to execute the instruction, the 

processor control unit 17 supplies the instruction execution unit 18 and the 
data dependence detector 10 with the type of the instruction to be executed. 
The data dependence detector 10 receives the type of the instruction 
through an execution instruction 14. 



Page 60 



01 11/27 22:32 FAX 03 3404 5748 UNIVERSAL PATENT BUREAU * Foley Lardner @013 



l J f-2901/nec/us/mh 



If the type of the instruction relates to a load instruction or a store 
instruction, then the processor control unit 17 supplies its address for 
loading and storing data to the data dependence detector 10 and the 
instruction execution unit 18. The data dependence detector 10 receives the 
5 input of the address through a load/store instruction 15. 

If the load instruction or the store instruction is subjected to a 
data dependence speculative execution, then the processor control unit 17 
informs the data dependence detector 10 of a notice to the effect that the 
load instruction or the store instruction is subjected to a data dependence 
10 speculative execution. 

The data dependence detector 10 may further include a hash 
function circuit 11 and an instruction history table 12. The data dependence 
detector 10 has three inputs of the speculative execution flag 13, the 
execution instruction 14 and the load/store instruction 15, which are 
15 supplied from the processor control unit 17. The data dependence detector 
10 has an output of the data dependence detected result 16 which is 
supplied to the processor control unit 17. 

The hash function circuit 11 comprises a logic circuit for 
realizing a hash function T' which converts m-bjts address for the load 
20 instruction or the store instruction into an n-bit entry number for the 
instruction history table 12, wherein "n" is usually smaller than "m". The 
hash function "T" takes the same value for the same input. If Nl = f(A1) 
and if N2 - f(A2), then Nl = N2, provided that Al = A2. The hash function 
circuit 11 ensures to output the same entry number of the instruction history 
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tabic for the same address of the load/stOTe instructions. 

If Al=? b A2, then generally Nl^N2is not ensured. Tt is possible 
that the same entry number of the instruction history table 12 may exist for 
the different addresses for the load/store instructions. 
5 The instruction history table 12 may comprise plural entries,, each 

of which stores one-bit information. Each of the plural entries may be 
realized by any available storage means, for example, a memory or a flip- 
flop. If the output from the hash function circuit 11 has an n-bit width, then 
a total number of the entries is equal to the n-power of 2. The instruction 
10 history table 12 performs a read/write operation to the entry designated by 
the entry number which has been outputted from the hash function circuit 
11. 

The instruction history table 12 is capable of initializing all of the 
entries upon receipt of the input of the speculative execution flag 13, 
15 wherein the same number is written into all of the entries. 

Operations of the data dependence detector 10 will be described 
with reference to FIGS. 3 and 4. FIG. 4 is a timing chart showing 
operations of the data dependence detector 

In an initial state, all of the entries of the Instruction history table 
20 12 are in the first state, for example, logic "0" or "L" which indicates that 
the load instruction has not been executed. If the processor docs not 
perform the speculative instruction execution to the data dependence and 
does perform the instruction execution in the order defined by the program 
(hereinafter referred to as "definitive execution state", then the data 
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dependence detector 10 does not perform any operation, In FIG. 4, cycles 
"0" to "3" correspond to the definitive execution state. 

If the processor does perform the speculative instruction 
execution to the data dependence and does not perforin the instruction 
5 execution in the order defined by the program (hereinafter referred to as 
"speculative execution state", then the data dependence detector 10 does 
perform a detecting process for a read after write dependence from the 
store instruction to the load instruction. In FIG. 4, cycles "4" to "8" 
correspond to the speculative execution state. 

10 The speculative execution flag 13 indicates whether the 

processor is in the definitive execution state or the speculative execution 
state. In FIG. 4, if the speculative execution flag 13 is the logical value "0", 
then this indicates the definitive execution state of the processor. The 
logical value "1" of the speculative execution flag 13 indicates the 

15 speculative execution state of the processor. 

If the speculative load instruction is executed in the speculative 
execution state (cycle 5), then the data dependence detector 10 stores, into 
the instruction history table 12, an information to the effect that the load 
instruction has speculatively been executed. In more detail, the address 

20 "Al" of the load instruction is converted by the hash function circuit 11 
into the entry number "Nl" which is inputted into the instruction history 
table 12. The instruction history table 12 converts a corresponding entry to 
the inputted entry number "Nl" into a second state, for example, a logical 
value "1" or "I i" which indicates that the load instruction has speculatively 



Page 63 



01 11/27 22:34 FAI 03 3404 5748 UNIVERSAL PATENT BUREAU -> Foley Lardner @016 



Pt-2901/nec/us/mh 
been executed. 

If the load instruction has {speculatively been executed (cycle 7), 
then the data dependence detector 10 reads out the state stored in the 
corresponding entry. In more detail, the address "A2" of the store 
5 instruction is converted by the hash function circuit 11 into the entry 
number N2 which is inputted into the instruction history table 12. The 
instruction history tabic 12 reads out the content oC the entry designated by 
the inputted entry number N2, and sends the read out contact through the 
data dependence detected result 16 to the processor control unit 17, 

10 If the address "Al " of the load instruction executed in the cycle 5 

is equal to the address "A2" of the load instruction executed in the cycle 7 
(Al = A2), then the respectively corresponding entry numbers Nl and N2 
are equal to each other in accordance with the property of the hash function 
circuit 11. The logical value "1" stored in the entry "Nl" of the instruction 

15 history table 12 by the load instruction in the cycle 5 is read out from the 
instruction history table 12 by the store instruction executed in the cycle 7. 
Thus, there is detected a possibility that the read after write dependence 
may be present from the store instruction executed in the cycle 7 to the load 
instruction executed in the cycle 5, 

20 Wh at is detected here is not "the read after write dependence 

exists" but is "there is a possibility of existence of the read after write 
dependence" because it is possible that the read after write dependence is 
not present even the value indicating the execution of the speculative load 
instruction has been read out from the instruction history table 12 by the 
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store instruction. 

In any event, the possibility of presence of the read after write 
dependence is detected, the data dependence detector 10 supplies the data 
dependence detected result 16 with a value, for example, logical value "1" 
5 which indicates that the data dependence speculative execution has not 
been succeeded. The processor control unit 17 and the instruction execution 
18 execute the necessary recovery processes to the failure of the data 
dependence speculative execution. 

If the address "Al" of the load instruction executed in the cycle 5 

10 is different from the address "A2" of the load instruction executed in the 
cycle 7 (Al ^ A2) f then, in accordance with the property of the hash 
function circuit 11, there arc both cases that the respectively corresponding 
entry numbers Nl and N2 different from each other, and that the 
respectively corresponding entry numbers Nl and N2 are equal to each 

15 other. 

In case that the respectively corresponding entry numbers Nl and 
N2 are different from each other, when the store instruction is executed in 
the cycle 7, the entry N2 of the instruction history table 12 has the initial 
value, for example, "0". It is detected that the read after write dependence 
20 is not present from the store instruction executed in the cycle 7 to the load 
instruction executed in the cycle 5. In this case, the data dependence 
detector 10 supplies the data dependence detected result 16 with a value, 
for example, logical value "0" which indicates that the data dependence 
speculative execution has been succeeded. The processor control unit 17 
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does not execute the recovery processes because of the success of the data 
dependence speculative execution and will execute the subsequent 
instructions. 

In case that the respectively corresponding entry numbers Nl and 
5 N2 are equal to each other, the load instruction in the cycle 5 and the store 
instruction in the cycle 7 refer the same entry (N1=N2) of the instruction 
history table 12. Even the address "Al" of the load instruction is different 
from the address <£ A2" of the store instruction and the read after write 
dependence from the store instruction to the load instruction is not present, 

10 then when the store instruction is executed in the cycle 1, the logic "1" 
having been written by the load instruction in the cycle 5 is read out from 
the instruction history table 12. It is detected that the possibility of 
existence of the read after write dependence is present from the store 
instruction executed in the cycle 7 to the load instruction executed in the 

15 cycle 5. In this case, the data dependence detector 10 supplies the data 
dependence detected result 16 with a value, for example, logical value "1" 
which indicates that the data dependence speculative execution has not 
been succeeded. The processor control unit 17 and the instruction execution 
unit 18 do execute the recovery processes to the failure of the data 

20 dependence speculative execution. What different addresses are allocated 
to the same entry of the instruction history table will hereinafter be referred 
to as "areas are generated". 

In case of generation of the areas between the load instruction 
and the store instruction, the processor control unit 17 is notified of a notice 
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to the effect that the read after write dependence has been present even if 
the read after write dependence has actually been not present. The 
unnecessary recovery process is thus executed. The means on the program 
is, however, unchanged. Accordingly, the correctness of the execution 
5 result of the program can be ensured. 

The execution of the unnecessary recovery process reduces the 
executive performance of the program. It is, however, possible to reduce 
the probability of generation of the areas by appropriate selection of the 
hash function of the hash function circuit 11 and by a sufficiently large 
10 number of the entries on the instruction history table 12. A sufficiently 
small probability of generation of the areas allows a substantive 
improvement of the executive performance of the program due to the effect 
of the data dependence speculative execution. 

It is possible that the areas may be generated between the load 
15 instructions. If, for example, a load instruction "LD1" with an address 
"Al" and a load instruction "LD2" with an address "A2" different from the 
address "Al" are executed, then it is possible that such areas are generated 
that referred entries "Nl" and «N2» are equal to each other. In this case, the 
logic "1" indicating that the load instruction has been executed is stored 
20 into the same entry Nl (=N2) of the instruction history table 12. 

If the store instruction has been executed for the address "Al" or 
"AT 3 having the read after write dependence on the load instruction "LD1" 
or "LD2", then the entry Nl (=N2) on the instruction history table is 
referred to detect the presence of the read after write dependence, provided 
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that it is impossible to distinguish the read after write dependence belongs 
to the load instruction "LD1" or "LD2". 

The presence of the read after write dependence is surely 
detected for executing the recovery process to the failure of the data 
5 dependence speculative. As a result, the correctness of the executed result 
of the program is ensured. 

As described above, the data dependence detector does not 
strictly detect the presence of the read after write dependence from the 
store instruction to the load instruction. The data dependence detector does 
10 detect the possibility of the presence of the read after write dependence 
from the store instruction to the load instruction. 

The addresses of the load instructions are allocated by the hash 
function circuit 11 into a specific entry of the instruction history table 12 
with overlap, for which reason it is unnecessary to store all of the addresses 
15 of the executed load instructions and also to provide any address 
comparator for comparing those addresses to the addresses of the store 
instructions. 

The data dependence detector provides an effect that the 
necessary hardware size for judging the success or the failure of the data 
20 dependence speculative execution is small. The number of the load 
instructions which may be subjected to the data dependence speculative 
execution is not limited by the hardware size of the data dependence 
deteeiur. The small hardware size allows the improvement in executive 
performance of the data dependence speculative execution. 
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The data dependence detector also provides another effect that it 
is unnecessary to compare the address of the store instruction and the 
address of the load instruction in the detection process for the read after 
write dependence in the execution of the store instruction. 
5 Further, only one entry is to be read out from the instruction 

history table, for which reason it is unnecessary to operate the logical-OR. 
The necessary time for detecting the read after write dependence is short 
and it is easy to obtain desirable high speed and high frequency 
performances of the processor. 

10 An example of the operation of the data dependence detector will 

be described with reference to FIG. 5 which is a block diagram illustrative 
of a structure of the data dependence detector. A data dependence detector 
20 comprises a hash function circuit 21 and an instruction history table 22. 
The data dependence detector 20, the hash function circuit 21 and the 

15 instruction history tabic 22 in FIG, 5 correspond to the data dependence 
detector 10,. the hash function circuit 1 1 and the instruction history table 12 
in FIG. 3. An execution instruction 24, a speculative execution flag 23 and 
a data dependence detected result 26 in FIG. 5 correspond to the execution 
instruction 14, the speculative execution flag 13 and the data dependence 

20 detected result 16 in FIG. 3. The data dependence detector 20 has inputs of 
the speculative execution flag, the execution instruction and the load/store 
instruction which arc supplied from the processor control unit. The data 
dependence detector 20 has an output of the data dependence detected 
result which is supplied to the processor control unit. The address for the 
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load/store instruction comprises 8-bits. 

FIG. 7A is a diagram illustrative of a first example of the hash 
function circuit in the first embodiment in accordance with the present 
invention. The address of the load/store instruction comprises 8-bits. The 
5 hash function circuit 21 fetches the second and third bits directly for 
conversion into the entry number which is represented by 2-bits on the 
instruction history table 22. The instruction history table 22 comprises four 
entries. The read and write operations arc executed with reference to the 2- 
bits entry number outputted from the hash function circuit 21 as well as the 

10 execution instruction and the speculative execution nag outputted from the 
processor control unit. 

FIG. 6A is a diagram illustrative of the instruction program 
sequence involved in operations of the data dependence detector. The 
instruction program sequeuce is the load instruction "LD1", the store 

15 instruction "ST1", the load instruction "LD2", the load instruction "LD3", 
and the load instruction "LD4". The load instruction "LD1" has an address 
"Al'\ The load instruction "LD2" has an address "A2". The load 
instruction "LD3" has an address "A3". The load instruction "LD4" has an 
address "A4". The store instruction "ST1" has an address "A4". The load 

20 instruction "I,D4" and the stOTe instruction "ST1" have the same address 
"A4". The store instruction "ST1" is prior to the load instruction "LD4". A 
data dependence is present from the store instruction "ST1" to the load 

instruction "LD4". The store instruction "STP is executed before the load 
instruction "LD4" is executed. The correctness of the result of execution of 
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the program is not ensured unless the content written at the address "A4" in 
the store instruction "ST1" is read out in the load instruction ""LD4". 

If the processor is in the data dependence speculative execution 
stale, the instructions arc executed in the sequence of having become 

5 executable even the presence of the read after write dependence from the 
store instruction to the load instruction has not been conformed. 

FIG. 6B is a diagram illustrative of the instruction execution 
sequence involved in operations of the data dependence detector. The 
instruction execution sequence is a non-program sequence execution or the 

10 data dependence speculative execution of the instruction columns of the 
program sequence. Cycle number, execution instruction,, and address of the 
execution instruction are shown. In the first cycle 1, the load instruction 
"IJD1" is executed at an address "Al =001 OOOOO". In the second cycle 2, 
the load instruction "LD2" is executed at an address "A2=00101000". In 

15 the third cycle 3, the load instruction "LD3" is executed at an address 
"A3=00001100". In the fourth cycle 4, the load instruction "LD4" is 
executed at an address "A4=00010000". In the fifth cycle 5, the store 
instruction "ST1" is executed at the same address "A4=00010000" as the 
load instruction '%D4". 

20 The store instruction "ST1" and the load instruction "LD4" 

should have to have the read after write dependence onto each other, 
wherein the store instruction "ST1" should have to be executed prior to the 
load instruction "LD4". Notwithstanding, it is the fact that the store 
instruction "ST1" has been executed after the load instruction "LD4" with 
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the violation to the read after write dependence. This means that the data 
dependence speculative execution is not succeeded. The load instruction 
"LD4" is read out a value from the address "A4", wherein this value is not 
the value written by the store instruction "ST1" at the address "A4" and 
5 this value is the previous value to the value written by the store instruction 
"SIT" at the address "A4". For this reason, the correctness of the executed 
result is not ensured. 

The data dependence detector 20 detects this failure of the data 
dependence speculative execution and then notifies this detected result 

10 through the data dependence detected result 16 to the processor control unit 
17. The processor control unit 17 performs the recovery process for the 
fail ure of the data dependence speculative execution in order to obtain the 
corrective executed result. 

FIG. 6C is a diagram illustrative of the data dependence detective 

15 unit involved in operations of the data dependence detector with reference 
to the instruction execution sequence. Cycle number, speculative execution 
flags, entry numbers on the instruction history table referred by the 
load/store instruction executed, state of the instruction history table arc 
shown. For the states of the instruction history table 22, contents on the 

20 entries "00", "01 "10" and "1 1 " arc shown. 

For example, in zero cycle "0", the speculative execution flag has 
the logical "0". The load and store instructions are not executed. The 
instruction history Lable 22 is in the initial stale. The contents of the four 
entries are logical "0" which indicates that the load instruction has not been 
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executed. 

From the first cycle 1 to the fifth cycle 5, the processor is in the 
speculative execution state, wherein the speculative execution flag takes 
the logical "1". 

5 1° the first cycle 1, the load instruction "LD1" is subjected to the 

data dependence speculative execution. The hash function circuit 21 takes 
the second and third bits of the address "Al=00100000" for the load 
instruction "LD1" and outputs the entry number "00" to be referred by the 
instruction history table 22. The instruction history table 22 refers the entry 

10 number "00" outputted from the hash function circuit 21 and also refers the 
execution instruction and the speculative execution flag, and the instruction 
history table 22 writes the logical "1" which indicates that the load 
instruction has been executed as the content of the entry "00". 

In the second cycle 2, the load instruction "LD2" is subjected to 

15 the data dependence speculative execution. The hash function circuit 21 
takes the second and third bits of the address "A2=00101000" for the load 
instruction "LD2" and outputs the entry number "10" to be referred by the 
instruction history table 22. The instruction history table 22 refers the entry 
number "10" outputted from the hash function circuit 21 and also refers the 

20 execution instruction and the speculative execution flag, and the instruction 
history table 22 writes the logical "1" which indicates that the load 
instruction has been executed as the content of the entry "10". 

In the third cycle 3, the load instruction "LD3" is subjected to the 
data dependence .speculative execution. The hash function circuit 21 takes 
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the second and third bits of the address "A3=00001100" for the load 
instruction "LD3" and outputs the entry number "11" to be referred by the 
instruction history table 22. The instruction history table 22 refers the entry 
number "11" outputted from the hash function circuit 21 and also refers the 
5 execution instruction and the speculative execution flag, and the instruction 
history table 22 writes the logical "1" which indicates that the load 
instruction has been executed as the content of the entry "11". 

In the fourth cycle 4, the load instruction "LD4" is subjected to 
the data dependence speculative execution. The hash function circuit 21 

10 takes the second and third bits of the address "A4=00010000" for the load 
instruction "LD4" and outputs the entry number "00" to be referred by the 
instruction history table 22. The instruction history table 22 refers the entry 
number "00" outputted from the hash function circuit 21 and also refers the 
execution instruction and the speculative execution flag, and the instruction 

15 history table 22 writes the logical "1" which indicates that the load 
instruction has been executed as the content of the entry "00". 

In the fifth cycle 5, the store instruction "STl" is subjected to the 
data dependence speculative execution. The hash function circuit 21 takes 
the second and third bits of the address "A4=00010000" for the store 

20 instruction "STl" and outputs the entry number "00" to be referred by the 
instruction history table 22. The instruction history table 22 refers the entry 
number "00" outputted from the hash function circuit 21 and also refers the 
execution instruction and the speculative execution flag, and the instruction 
history table 22 read out the logical "1" which indicates that the load 
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instruction has been executed from the entry "00", whereby the read after 
write dependence from the store instruction ST1 to the load instruction 
LD1 or LD4. 

Since the processor is in the data dependence speculative 
5 execution state, the data dependence detector 20 takes the logical "1" for 
the data dependence detected result 26 to notify the processor control unit 
17 of the failure of the data dependence speculative execution. 

Upon receipt of the notice of the failure of the data dependence 
speculative execution, the processor control unit 17 performs the necessary 
10 recovery process for the failure of the data dependence speculative 
execution, from the subsequent sixth cycle 6. Due to the failure of the data 
dependence speculative execution, the speculative execution flag returns to 
the logical "0". The instruction history table 22 re-sets the contents of all 
the entries at the logical "0" for the next data dependence speculative 
15 execution. 

The example of the operation by the data dependence detector 
against the failure of the data dependence speculative execution has been 
described above. If the data dependence speculative execution has been 
succeeded, then no recovery process is executed and subsequent 
20 instructions are continuously executed. 

With reference again to FIG. 6C, even the address 
"Al =001 00000" of the load instruction "LD1" is different from the address 
"A4=00010000" of the load instruction "LD4", then the same entry "00" 
on the instruction history table 22 is referred, wherein the "areas" appears, 
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because the hash function circuit 21 converts the address "Al =00 100000" 
and the address "A4=00010000" into the same entry number "00". Even if 
the load instruction "LD4" was executed after the store instruction "ST1" 
and the read after write dependence was not detected, then a pseudo read 
5 after write dependence is detected from the store instruction "ST1" to the 
load instruction "LDJT. Thus, even the data dependence speculative 
execution has been succeeded, the data dependence detector incorrectly 
judges the data dependence speculative execution to be the failure. The 
unnecessary recovery process is thus executed, but this recovery process 
10 does not change the meaning of the program, for which reason the 
correctness of the executed result of the program is ensured even the areas 
appear. 

The execution of the unnecessary recovery process reduces the 
executive performance of the program. It is, however, possible to reduce 

15 the probability of generation of the areas by appropriate selection of the 
hash function of the hash function circuit 21 and by a sufficiently large 
number of the entries on the instruction history table 22. A sufficiently 
small probability of generation of the areas allows a substantive 
improvement of the executive performance of the program due to the effect 

20 of the data dependence speculative execution. 

For example, if the number of the entries on the instruction 
history table 22 increases from 4 to 8, and if the hash function circuit 21 
takes the second, third and fourth bits of the address, then the address 
"A1=00100000" is allocated to the entry "000", the address 
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"A2=00l01000" is allocated to the entry "010", the address 
"A3=00001100" is allocated to the entry "Oil", the address 
"A4=00010000" is allocated to the entry "100", for which reason the areas 
do not appear. 

5 If the number of the entries of the instruction history table 22 is 

unchanged at 4, and if the hash function circuit 21 takes a first exclusive- 
OR of the second and fourth bits of the address for 0-bit digit and also takes 
a second exclusivc-OR of the third and fifth bits of the address for 1-bit 
digit, then the address "Al=001 00000" is allocated to the entry "10", the 

10 address "A2=00101000" is allocated to the entry "00", the address 
"A3=00001100" is allocated to the entry "11", the address "A4=00010000" 
is allocated to the entry "01", for which reason the areas do not appear. 

Accordingly, it is possible to reduce the probability of generation 
of the areas by appropriate selection of the hash function of the hash 

15 function circuit 21 and by a sufficiently large number of the entries on the 
instruction history table 22. 

The data dependence detector provides an effect that the 
necessary hardware size for judging the success or the failure of the data 
dependence speculative execution is small. The number of the load 
20 instructions which may be subjected to the data dependence speculative 
execution is not limited by the hardware size of the data dependence 
deLector. The small hardware size allows the improvement in executive 
performance of the data dependence speculative execution. 

The addresses of the load instructions are allocated by the hash 
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function circuit 21 into a specific entry of the instruction history table 22 
with overlap, for which reason it is unnecessary to store all of the addresses 
of the executed load instructions and also to provide any address 
comparator for comparing those addresses to the addresses of the store 
5 instructions. 

The data dependence detector also provides another effect that it 
is unnecessary to compare the address of the store instruction and the 
address of the load instruction in the detection process for the read after 
write dependence in the execution of the store instruction. 

10 Further, only one entry is to be read out from the instruction 

history table, for which reason it is unnecessary to operate the logical-OR, 
The necessary time for detecting the read after write dependence is short 
and it is easy to obtain desirable high speed and high frequency 
performances of the processor. 

15 In accordance with the above-described first embodiment, if the 

load instruction has been executed, the address of the load instruction is 
converted by the hash function circuit 11 into the entry number of the 
instruction history table "12. The value indicating that the load instruction 
has been executed is stored into the entry of the instruction history table 12, 

20 which is indicated by the entry number as converted. If the store instruction 
has been executed, the address of the store instruction is converted by the 
hash function circuit 11 into the entry number of the instruction history 
table 12. The value indicating that the store instruction has been executed is 

read out from the entry of the instruction history table 12, which is 
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indicated by the entry number as converted. As a result, the possibility or 
the presence of the read after write dependence is detected. The above 
described structure may also be applicable to detect the possibility of a 
write after read dependence or the possibility of an write after write 
5 dependence. 

For example, if the store instruction has been executed, the 
address of the store instruction is converted by the hash function circuit 11 
into the entry number of the instruction history table 12. The value 
indicating that the store instruction has been executed is stored into the 

10 entry of the instruction history table 12, which is indicated by the entry 
number as converted. If the load instruction has been executed, the address 
of the load instruction is converted by the hash function circuit 11 into the 
entry number of the instruction history table 12. The value indicating that 
the load instruction has been executed is read out from the entry of the 

15 instruction history table 12, which is indicated by the entry number as 
converted. As a result, the possibility of the presence of the write after read 
dependence is detected. 

For example, if the store instruction has been executed, the 
address of the store instruction is converted by the hash function circuit 11 

20 into the entry number of the instruction history table 12. The value 
indicating that the store instruction has been executed is read out from the 
entry of the instruction history table 12, which is indicated by the entry 
number as converted. In addition, the value indicating that the store 
instruction has been executed is stored into the entry of the instruction 
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history table 12, which is indicated by the entry number as converted. As a 
result, the possibility of the presence of the write after write dependence 
between the store instructions is detected. 

For example, if the load instruction has been executed, the 
5 address of the load instruction is converted by the hash function circuit 11 
into the entry number of the instruction history table 12. The value 
indicating that the load instruction has been executed is read out from the 
entry of the instruction history table 12, which is indicated by the entry 
number as converted. In addition, the value indicating that the load 
10 instruction has been executed is stored into the entry of the instruction 
history table 12, which is indicated by the entry number as converted. If the 
store instruction has been executed, the address of the store instruction is 
converted by the hash function circuit 11 into the entry number of the 
instruction history table 12. The value indicating that the store instruction 
15 has been executed is read out from the entry of the instruction history table 
12, which is indicated by the entry number as converted, in addition, the 
value indicating that the store instruction has been executed is stored into 
the entry of the instruction history table 12, which is indicated by the entry 
number as converted. As a result, the possibility of the presence of the read 
20 after write dependence, the possibility of the presence of the write after 
read dependence and the possibility of the presence of the write after write 
dependence between the store instructions arc detected. 

For the data dependence speculative execution, the data 
dependence detector docs not strictly detect the presence of the read after 
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write dependence from the store instruction to the load instruction, but does 
detect the possibility of the presence of the read after write dependence 
from the store instruction to the load instruction. The address of the load 
instruction is allocated by the hash function circuit with duplication or 

5 overlap to the entry designated on the instruction history table. It is 
unnecessary to store all of the addresses of the load instructions as executed 
and also unnecessary to provide an address comparator foT comparison to 
the address of the store instruction. The data dependence detector provides 
an effect that the necessary hardware size for judging the success or the 

0 failure of the data dependence speculative execution is small. The number 
of the load instruction, which may be subjected to the data dependence 
speculative execution is not limited by the hardware size of the data 
dependence detector, for which reason a small hardware size may improve 
the performances in the data dependence speculative execution. The data 

5 dependence detector also provides another effect that it is unnecessary to 
compare the address of the store instruction and the address of the load 
instruction in the detection process for the read after write dependence in 
the execution of the store instruction. Further, only one entry is to be read 
out from the instruction history table, for which reason it is unnecessary to 

D operate the logical-OR. The necessary time for detecting the read after 
write dependence is short and it is easy to obtain desirable high speed and 
high frequency performances of the processor. 
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A second embodiment according to the present invention will be 
described in detail with reference to the drawings. Another data 
dependence detector is provided in this embodiment. The data dependence 
detector realizes the data dependence speculative execution in a thread 
5 parallel processing in a multiple processor system. The thread parallel 
processing is that a single program or plural programs are divided into 
plural program units so called to as "threads", so that the threads arc 
subjected to the parallel processings by the plural processors for obtaining a 
high speed processings of the programs. 

10 The thread is the unit of the program. The thread comprises a 

queue of plural instructions. A sequence among the plural threads is 
defined. For example, two threads "TO" and "Tl" belong to the same 
program. The sequence relation is defined between the two threads "TO" 
and "Tl". If the thread 'TO" is prior to the thread "Tl", then this means on 

15 the program that the thread "TO" should be executed prior to the thread 
"Tl". Tf the thread "Tl" is prior to the thread "TO", then this means on the 
program that the thread "Tl" should be executed prior to the thread "TO". 
If no dependence is present between the two threads "TO" and "Tl", then 
the executions of the two threads "TO" and "Tl" without considering the 

20 sequence relation between them does not change the program meanings, 
and the correct result of execution of the program can be obtained. 

In the thread parallel processings, the threads arc executed in 

parallel to each other independently from the sequence relation between 
them for obtaining high speed execution of the program. The non-program- 
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sequence execution in instruction threads is in comparison to the non- 
program-sequence execution in instruction units. 

The non-program-scquence execution in instruction threads 
needs to ensure and cancel the dependence between instructions included in 
5 the thread for obtaining the corrective result of the program. 

Similarly to the non-program-sequence execution in instruction 
units, it is necessary to do the execution in the program sequence 
particularly for the read after write dependence relating to a memory. The 
execution definitively in the program sequence do not obtain the effect of 
10 the improvement in execution of the non-program-sequence execution. 

Particularly, this is more serious problem for the non -program- 
sequence execution in the thread unit because the thread units, each of 
which comprises plural instructions, prevent the non-program-sequence 
execution. 

15 FoT countcrmcasure to the above problem, it is effective to do the 

data dependence speculative execution. The presence or absence of the read 
after write dependence between the instructions included in the thread is 
not confirmed. In place, it is assumed that no read after write dependence is 
present and the non-program-sequence execution in the thread units are 

20 speculatively executed. Namely, the data dependence speculative execution 
in the thread units is implemented. 

For realizing the data dependence speculative execution in the 
thread units, it is necessary to detect the read after write dependence from 
the store instruction included in the prior thread to the load instruction 
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included in the later thread. 

The data dependence detector of this second embodiment has a 
function to detect the read after write dependence from the store instruction 
included in the prior thread to the load instruction included in the later 
5 thread. The data dependence detector performs the non-program-scquence 
execution in the thread units, but assumes that the non-program-sequence 
execution is not made between instructions included in the same thread. 
The data dependence detector docs not detect the read after write 
dependence between instructions included in the same thread. 

10 VIG. 8 is a diagram illustrative of a structure of a multiple 

processor system including a data dependence detector in a second 
embodiment in accordance with the present invention. FIG. 7B is a diagram 
illustrative of a first example of the hash function circuit in the second 
embodiment in accordance with the present invention. The system includes 

15 four processors 50, 51, 52 and 53, and a thread control unit 55. The 
processors 50, 51, 52 and 53 have the same structure. The illustration of the 
internal structures of the processors 52 and 53 are omitted. 

The thread control unit 55 allocates threads to the processors 50, 
51, 52 and 53, so that the processors 50, 51, 52 and 53 execute the 

20 respectively allocated threads. The thread control unit 55 also notifies the 
processors 50, 51, 52 and 53 of the sequence relations between the threads. 
The thread control unit 55 also instructs the processors 50, 51, 52 and 53 to 
perform the recovery process for the failures of the data dependence 
speculative execution in the thread units. The thread control unit 55 also 
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performs the thread parallel processings and controls the data dependence 
speculative execution. Each of the processors 50, 51, 52 and 53 includes a 
data dependence detector 30, a processor control unit 57 and an instruction 
execution unit 18. Other elements for the processor are not illustrated. 
5 The thread control unit 55 allocates the threads to the processors 

50, 51, 52 and 53 for the thread parallel processings of the program by the 
processors 50, 51, 52 and 53. The thread control unit 55 also supplies the 
processors 50, 51, 52 and 53 with the speculative execution flag indicating 
whether or not the thread is executed in the data dependence speculative 

10 execution and also with the thread sequences which indicate the sequence 
relations to the other threads to be executed by the other three processors. 

Each of the processors 50, 51, 52 and 53 has input ports for 
speculative execution flags 39 and for thread sequences 40. "When the 
processor executes the thread allocated by the thread control unit 55, the 

15 processor control unit 57 supplies the kind of the instruction, which is to be 
executed, to the instruction execution unit 18 and the data dependence 
detector 30 as well as to the other three processors through execution 
instructions 46. 

The data dependence detector 30 receives the input of the kind of 
20 the instruction to be executed from an input port of an execution instruction 
34, Concurrently, the data dependence detector 30 also receives the inputs 
of the kinds of the instructions to be executed by the other three processors 
from input ports of execution instructions 38. If the instruction to be 
executed is the load or store instruction, the processor control unit 57 
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supplies the instruction execution, unit 18 and the data dependence detector 
30 with the address of the load instruction or the store instruction as wall as 
sends the address of the load instruction or the store instruction through a 
load/store instruction 45 to the other three processors. The data dependence 
5 detector 30 receives an input of the address of thti load instruction or the 
store instruction from a load/store instruction 35. If the instructions to be 
executed by the other three processors is the load instruction or the store 
instruction, then the data dependence detector 30 receives inputs of the 
addresses of the load instruction or the store instruction from load/store 

10 addresses 37. 

If the processor 50 executes the thread allocated by the thread 
control unit 55, then the processor control unit 57 of the processor 50 
supplies the kind of the instruction to be executed to the instruction 
execution unit 18 and the data dependence detector 30 as well as to the 

15 execution instruction 46 which is then transmitted to the other three 
processors 51, 52 and 53. Concurrently, the data dependence detector 30 
receives, from the execution instructions 38, the kinds of the instructions to 
be executed by the other three processors 51, 52 and 53. 

If the instruction to be executed is the load instruction or the 

20 store instruction, the processor control unit 57 of the processor 50 supplies 
the address of the instruction to the instruction execution unit 18 and the 
data dependence detector 30 of the processor 50 as well as to the load/store 
addresses 45 which arc then transmitted to the other three processors 51, 52 
and 53. Concurrently, if the instructions to be executed by the other three 
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processors 51, 52 and 53 arc the load instructions or the store instructions, 
then the data dependence detector 30 of the processor 50 receives the input 
of the addresses of the instructions from the load/store addresses 37. 

The data dependence detector 30 of each of the processors 50-53 
5 includes the hash function circuit 31, the instruction history tabic 32 and a 
]ogic-OR circuit 41. The processor in which the circuitry such as the data 
dependence detector 30, the processor control unit 57 and the instruction 
execution unit IS are provided, will hereinafter be referred to as self- 
processor. 

10 Tne d ata dependence detector 30 receives the inputs of the 

execution instruction 34 and the load/store instruction 35 from the 
processor control unit 57 in the self-processor as well as receives the inputs 
of the execution instructions 38 and the load/store addresses 37 from the 
other three processors. The data dependence detector 30 also has input 

15 ports of the speculative execution flag 39 and the thread sequence 40 from 
the thread control unit 55. The data dependence detector 30 also has an 
output port of the data dependence detected result 36 which is supplied to 
the thread control unit 55. 

Each of the processors 50-53 has four hash function circuits 31, 
20 each of which comprises a logic circuit realizing a hash function "f 1 which 
converts the address of the load instruction or the store instruction into the 
entry number of the instruction history table 32. The hash function 'T 
takes the same value for the same input, if Nl =I(A1) and N2 = f(A2), then 
Al = A2 cases Nl = N2. One of the four hash function circuits 31 converts 
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the address of the load instruction to be executed by the self-processor into 
the entry number of the instruction history table 32. Each of the remaining 
three hash functions "f" converts the address of the store instruction to be 
executed by respective one of the other three processors into the entry 
5 number of the instruction history table 32. 

The instruction history table 32 comprises a plurality of entry 
which stores one-bit information. If the output from the hash function 
circuit 31 has an n-bit width, then a total number of the entries is equal to 
the n-powcr of 2. The instruction history table 32 has a single write port 

1 0 and three read ports for performing a single write operation and three read 
operations concurrently. 

One of the four hash function circuits 31 has the input port of the 
address of the load instruction to be executed by the self-processor. This 
hash function circuit 31 has an output port which is connected to the write 

15 port of the instruction history table 32 for write operation to the entry 
designated by the entry number supplied from the hash function circuit 31. 
The remaining three hash function circuits 31 having the input ports of the 
addresses of the store instructions to be executed by the processors have an 
output port which is connected to the read port of the instruction history 

20 table 32. 

The instruction history table 12 is capable of initializing all of the 
entries upon receipt of the input of the speculative execution flag 39 from 

the thread control unit 55, wherein the Same number is written into ail of 

the entries. 
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A logic-OR function 41 operates a logic-OR of the three read out 
reeultc from the throe- road ports of the instruction history tabic 32, and this 
operated result is supplied to the data dependence detected result 36 which 
is transmitted to the thread control unit 55. 
5 Operations of the data dependence detector 30 will be described 

with reference to FIG. 8. In an initial state, all of the entries of the 
instruction history table 32 are in the first state, for example, logic "0" 
which indicates that the load instruction has not been executed by the self- 
processor. If the self-processor does not perform the speculative instruction 

10 execution to the data dependence and does perform the instruction 
execution in the order defined by the program (hereinafter referred to as 
"definitive execution state", then the data dependence detector 30 does not 
perform any operation. 

If the self-processor does perform the speculative instruction 

15 execution to the data dependence and docs not perform the instruction 
execution in the order defined by the program, then the data dependence 
detector 30 does perform a detecting process for a read after write 
dependence from the store instruction to the load instruction, wherein the 
store instruction is to be executed by the processor allocated with a. prior 

20 thread which is prior in program sequence to the thread to be executed by 
the self-processor, whilst the load instruction is to be executed by the self- 
processor. 

The speculative execution Hag 39 supplied from the thread 
control unit 55 and inputted into the data dependence detector 30 indicates 
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whether the self-processor is in the definitive execution state or the 
speculative execution state. 

Whether the thread to be executed by the self-processor is prior 
in program sequence to the threads to be executed by the other three 
5 processors is judged by the thread sequence 40 which is supplied from the 
thread control unit 55 and inputted into the data dependence detector 30. 
The kinds of the instructions on executions by the other three processors 
are judged by the execution instructions 38 supplied from the other three 
processors into the data dependence detector 30. 

10 If the self-processor is in the Speculative execution state, after the 

self-processor performs the speculative execution of the load instruction, 
then the data dependence detector 30 stores that the load instruction has 
been speculatively executed to the instruction history table 32. In more 
detail, the address of the load instruction is converted by the hash function 

15 circuit 31 into the entry number of the instruction history table 32 for input 
into the write port of the instruction history table 32. The instruction history 
tabic 32 converts a corresponding entry to the inputted entry number into a 
second state, for example, a logical value "1" which indicates that the load 
instruction has speculatively been executed. 

20 If the self-processor is in the speculative execution state and if 

the store instruction is executed by the processor which is allocated with 
the prior thread which is prior in program sequence to the thread to be 
executed by the self-processor, then the data dependence detector 30 read 
out the state from the corresponding entry of the instruction history tabic 32. 



Page 90 



'01 11/27 22:49 FAX 03 3404 5748 



UNIVERSAL PATENT BUREAU * Foley Lardner 



Pf-2901/nec/us/mli 

In more detail, the address of the store instruction is converted by the hash 
function circuit 31 into the entry number of the instruction history table 32 
for input into the corresponding one of the three input ports of the 
instruction history table 32. The instruction history table 32 reads out the 
5 content of the entry designated by the entry number inputted arid supplies 
the content to the logic-OR circuit 41. The logic-OR circuit 41 operates the 
logic-OR of the outputs from the three red ports of the instruction history 
table 32 and supplies the operated result to the data dependence detected 
result 36 which is transmitted lo the thread control unit 55. 

10 If the address of the store instruction as executed by the 

processor allocated with the prior thread to the thread executed by the self- 
processor is cither equal to the address of the load instruction subjected to 
the data dependence speculative execution by the self-processor or 
allocated to the same entry of the instruction history table 32 due to 

15 appearance of the areas, then the value indicating, that the load instruction 
subjected to the data dependence speculative execution is present, is read 
out from the instruction history table 32 7 the possibility of the presence of 
the read after write dependence is detected. In this case, the data 
dependence detector 30 supplies the value indicating the failure of the data 

20 dependence speculative execution, for example, logical "1" onto the data 
dependence detected result 36 which is then transmitted to the thread 
control unit 55. 

The thread control unit 55 receives the notice of the failure of the 
data dependence speculative execution from either the processors 50-53, 
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and the thread control unit 55 sends a request for recovery process against 
the failure of the data dependence speculative execution to the processor 
which has supplied the above notice of the failure of the data dependence 
speculative execution as well as to the processors executing the thread post 
5 in program sequence to the thread executed by the processor which has 
supplied the above notice. The processors have received the request for the 
recovery process to the failure of the data dependence speculative 
execution, wherein the processor control unit 57 receives the notice of the 
request for the recovery process from the recovery execution request 47. 

10 The reason why the request for the recovery process to the failure 

of the data dependence speculative execution is sent to not only the 
processor which has supplied the above notice of the failure of the data 
dependence speculative execution but also the processors executing the 
thread post in program sequence to the thread executed by the processor 

15 which has supplied the above notice would be as follows. It is possible that 
the processor which has notified the failure of the data dependence 
speculative execution has subjected the load instruction to the data 
dependence speculative execution in violation to the read after write 
dependence. It is possible that the executed result of the thread with the 

20 failure of the data dependence speculative execution is not correct. It is thus 
possible that the executed result of the thread post in program sequence to 
the thread with the failure of the data dependence speculative execution is 
also not correct. It is further possible that the executed result of the thread 
post in program sequence to the thread which is further post in program 
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sequence to the thread with the failure of the data dependence speculative 
execution is also not correct. Consequently, it is, therefore, possible that the 
executed results of any threads, which are post in program sequence to the 
thread with the failure of the data dependence speculative execution would 
5 be not correct. For those reasons, the request for the recovery process to the 
failure of the data dependence speculative execution is sent to not only the 
processor which has supplied the above notice of the failure of the data 
dependence speculative execution but also the processors executing the 
thread post in program sequence to the thread executed by the processor 

10 which has supplied the above notice, in order to ensure the correctness of 
the executed result of the threads. 

In the processor having received the request for the recovery 
process against the failure of the data dependence speculative execution 
from the thread control unit 55, the processor control unit 57 is notified of 

15 the request for the recovery process against the failure of the data 
dependence speculative execution from the recovery process execution 
request 47. In this case, each of the processors 50-53 performs the 
necessary recovery process to ensure the correctness of the executed result 
of the threads. 

20 The data dependence detector 30 has a function to detect the read 

after write dependence from the store instruction included in the thread 
prior in program sequence to the thread executed by the self-processor to 
the load instruction included in the thread executed by the self-processor. 
This function is realized by that the instruction history tabic 32 of the data 
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dependence detector 30 has the write port for writing operation for the data 
dependence speculative execution of the load instruction by the self- 
processor and also has the plural read ports for read out operations for the 
executions of the store instructions by the other processors. Hie function 
5 enables the data dependence speculative execution in thread units in the 
thread parallel processings for improving the high speed processing. 

The data dependence detector performs the non-program-scquence 
10 execution in the thread units, but assumes that the non -program-sequence 
execution is not made between instructions included in the same thread. 
The data dependence detector does not detect the read after write 
dependence between instructions included in the same thread. 

The data dependence detector 30 does not strictly detect the 
1 5 presence of the read after write dependence from the store instruction to the 
load instruction, but does detect the possibility of the presence of the read 
after write dependence from the store instruction to the load instruction. 
The address of the load instruction is allocated by the hash function circuit 
31 with duplication or overlap to the entry designated on the instruction 
20 history table. It is unnecessary to store all of the addresses of the load 
instructions as executed and also unnecessary to provide an address 
comparator for comparison to the address of the store instruction. The data 
dependence detector provides an effect that the necessary hardware size for 
judging tire success or the failure of the data dependence speculative 
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execution is small. 

The number of the load instruction which may be subjected to the 
data dependence speculative execution is not limited by the hardware size 
of the data dependence detector, for which reason a small hardware size 
5 may improve the performances in the data dependence speculative 
execution. Particularly, the data dependence speculative execution in the 
thread units is likely to have a large number of the load/store instructions. 
The above effect free of the hardware size is more effective to the data 
dependence speculative execution in the thread units. 

'0 Tne data dependence speculative execution in the thread parallel 

processings will be described in more deLail with reference to FIGS. 8, 9A, 
9B and 9C. FIG. 9A is a diagram illustrative of a program sequence of 
instruction. The program includes twelve instructions which arc grouped 
into four threads, each of which includes three instructions. The twelve 

15 instructions include load instructions "LD1", "LD2", "LD3", "LD4'% and 
"LD5", and store instructions "ST1", "ST2", and "ST3", and other 
instructions marked with black-box. 

The program sequence is that the load instruction "LD1" at an 
address "AT\ the other instruction, the store instruction "ST1" at an 
20 address "A3", the load instruction "LD2" at an address "A2", the store 
instruction "ST2" at an address "Al", the other instruction, the load 
instruction "LD3" at an address "A3", the other instruction, the load 
instruction "LD4" at an address "A4", the load instruction "LD5" at an 
address "A5", the store instruction «ST3" at an address "A3", and the other 
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instruction. 

The programs aic divided into four threads "0", "V, "2" and "3" 
which are to be executed by the four processors 50-53 in parallel. Each of 
the threads "0", "1", "2" and "3" includes three instructions. The thread "0" 
5 includes the load instruction "LD1" at the address "Al", the other 
instruction, and the store instruction "SIT" at the address "A3". The thread 
'T" includes the load instruction "LD2" at the address "A2", the store 
instruction "ST2" at the address "Al" and the other instruction. The thread 
"2" includes the load instruction "LD3" at the address "A3", the other 

10 instruction, the load instruction "LD4" at the address "A4 } \ The thread "3" 
includes the load instruction "LD5" at the address "AS", the store 
instruction "ST3" at the address "A3", and the other instruction. The thread 
"0" is executed by the processor 52. The thread "1" is executed by the 
processor 51. The thread "2" is executed by the processor 50. The thread 

15 "3" is executed by the processor 53. 

The load instruction "LD3 and the store instruction "ST1" have 
the same address "A3". The store instruction "ST1" is prior in program 
Sequence to the load instruction "LD3. There is a data dependence from the 
store instruction "ST1" to the load instruction "LD3. The store instruction 

20 "STl" is executed prior to the load instruction "LD3. The content is stored 
at the address "A3" by the store instruction "STl", before the content is 
then red out from the address "A3" by the load instruction "LD3" in order 
to ensure the correctness of the executed result of the program. In the 
parallel processings in the thread units, the read after write dependence is 
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present from the thread "0" including the store instruction "ST1" to the 
thread "2" including the load instruction "LD3". At the time when the 
threads are allocated to the processors, the read after write dependence 
from the store instruction "ST1" to the load instruction "LD3" or the read 
5 after write dependence from the thread "0" to the thread "2" has not yet 
been known. For those reasons, for the definitive thread parallel 
processings, it is necessary that the threads, which have been not confirmed 
of no presence of the read after write dependence, arc sequentially executed 
to ensure the read after write dependence. It is difficult to obtain the 

1 0 advantages in high speed processings in the thread parallel processings. 

In accordance with the data dependence speculative execution in 
the thread units, the threads arc speculatively executed in parallel even the 
presence of the read after write dependence between the threads is 
unknown. If the probability of the success of the data dependence 

15 speculative execution is sufficiently high, then the advantages in high speed 
processings of the program by the thread parallel processings can be 
obtained. 

FIG. 9B is a diagram illustrative of the instruction execution 
sequences of the data dependence speculative execution in the thread 
20 parallel execution. Cycle numbers, the execution instructions in the 
corresponding cycle by the processor 50 which executes the thread "2", the 
address of the execution instruction, the execution instructions in the 
corresponding cycle by the processor 51 which executes the thread "1", the 
address of the execution instruction, the execution instructions in the 
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corresponding cycle by the processor 52 which executes the thread "0", the 
address of the execution instruction, the execution instructions in the 
corresponding cycle by the processor 53 which executes the thread "3", and 
the address of the execution instruction. 
5 The processor 50 executes the thread "2", wherein the processor 

50 executes the load instruction "LD3" at the address "A3" in cycle 1, the 
other instruction in cycle 2, the load instruction "LD4" at the address "A4" 
in cycle 3. The processor 51 executes the thread "1", wherein the processor 

51 executes the load instruction "LD2" at the address "A2" in cycle 1, the 
10 store instruction "ST2" at the address "Al" in cycle 2 and the other 

instruction in cycle 3. The processor 52 executes the thread "0", wherein 
the processor 52 executes the load instruction "LD1" at the address "Al" in 
cycle 1, the other instruction in cycle 2, and the store instruction "ST1" at 
the address "A3" in cyele 3. The processor 53 executes the thread "3", 

15 wherein the processor 53 executes the load instruction "LD5" at the address 
"A5" in cycle 1, the store instruction "ST3" at the address "A3" in cycle 2, 
and the other instruction in cycle 3. 

The thread "0" is the most prior thread in the program sequence. 
There is no read after write dependence from the store instruction included 

20 in the threads "1 *% "2" and "3" to the load instruction included in the thread 
£i 0". The processor 52 executes the thread "0" definitively. The issues of 
the presence of the read after write dependence for the threads "1", "2" and 
"3" are unknown. The processors 50, 51 and 53 execute the threads "1", 
"2" and "3" speculatively or in the data dependence speculative execution. 
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As shown in FIG. 9B, even if the data dependence speculative execution is 
succeeded, then the program comprising the twelve instructions aT C 
executed only in three cycles. A remarkable improvement of the program 
execution speed can be obtained. 
5 In case shown in FIG, 9B, if the load instruction "LD3" is 

executed by the processor 50 in cycle 1 , and the store instruction "ST1" is 
executed by the processor 52 in cycle 3 even the read after write 
dependence is not ensured from the store instruction "ST1" to the load 
instruction "LD3". The correctly executed result can not be obtained. The 

10 data dependence speculative execution of the thread "2" by the processor 
50 becomes failure. The data dependence detector 30 in the processor 50 
detects the read after write dependence from the store instruction "ST1" 
executed by the processor 52 to the load instruction "LD3" executed by the 
processor 50. The data dependence detector 30 also notifies the thread 

15 control unit 55 of the failure of the data dependence speculative execution 
of the thread 2. The thread control unit 55 receives the notice of the failure 
of the data dependence speculative execution of the thread 2 and then sends 
the request for the necessary recovery processes to the processor 50 and 
also to the processor 53 which executes the thread "3" post in program 

20 sequence to the thread "2" which data dependence speculative execution 
become failure. 

FIG. 9C is a diagram illustrative of the data dependence detection 
operation by the data dependence detector in the processor 50 in the 
instruction execution sequence shown in FIG. 9B. Cycle numbers and 
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states of the instruction history tabic 32 in respective cycles are shown. The 
instruction history tabic 32 comprises four entries "00", "01", "10" and 

" "1 1 " 

In cycle "0", the instruction history table 32 is in the initial state, 
5 wherein the contents of the all entries are logical "0" which indicates that 
the load instruction has not been executed. In cycle "1", the processor 50 
performs the data dependence speculative execution of the load instruction 
"LD3" included in the thread "2". The address "A3" of the load instruction 
"LD3" is converted by the hash function circuit 31 into the entry number 

10 W 7 of the instruction history table 32, the content of the entry "00" of the 
instruction history table 32 becomes logical "1" which indicates that the 
load instruction has been executed. 

In cycle "2", the processor 51 executes the store instruction 
"ST2" included in the thread "1" and the processor 53 executes the store 

15 instruction "ST3" included in the thread "3". The thread "1" executed by 
the processor 51 is prior in program sequence to the thread "2" subjected to 
the data dependence speculative execution by the processor 50. The thread 
"1" is the subject to the read after write dependence detection. The data 
dependence detector 30 receives the kinds of the instructions executed by 

20 the other processors from the execution instructions 38 and also receives 
the addresses of the execution instructions from the load/store addresses 37. 
The address "A2" of the store instruction "ST2" is converted by the hash 
function circuit 31 into the entry number "10" of the instruction history 
table 32, the data dependence detector 30 reads out the content of the entry 
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"10" of the instruction history table 32 for detecting the read after write 
dependence from the store instruction "ST2 to the load instruction executed 
by the self-processor. In this case, the logical "0" indicating that the load 
instruction has not been executed is read out from the entry number "10" of 
5 the instruction history table 32, for which reason the data dependence 
speculative execution of the thread "2" by the processor 50 is succeeded to 
the thread "1" executed by the processor 1. 

The thread "3" executed by the processor 53 is post in program 
sequence to the thread "2" subjected to the data dependence speculative 
1 0 execution by the processor 50. The thread "3" is not the subject to the read 
after write dependence detection. The address "A3" of the store instruction 
"ST3" included in the thread "3" is the same as the address of the load 
instruction "LD3" executed by the processor 50 in cycle "I". The load 
instruction "TD3" is prior in program sequence to the store instruction 
15 "ST3". The read after write dependence is always not present. The 
instruction history table 32 in the processor 50 docs not judge the success 
or the failure of the data dependence speculative execution to the store 
instruction included in the thread "3" executed by the processor 53. The 
instruction history table 32 in the processor 50 judges that the data 
20 dependence speculative execution has been succeeded in cycle 2 and 
subsequently continues the thread executions. 

In cycle "3" the processor 50 executes the load instruction 
"LD4" included in the thread "2" and the processor 52 executes the store 
instruction "STl" included in the thread "0". The address "A4" of the load 
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instruction "LD4" is converted by the hash function circuit 31 into the 
entry number "ll" of the instruction history tabic 32, the instruction history 
table 32 makes the content of the entry "11" on the instruction history table 
32 into the logical "1" indicating that the load instruction has been 
5 executed. The thread "0" executed by the processor 52 is prior in program 
sequence to the thread "2" subjected to the data dependence speculative 
execution by the processor 50. The thread "0" is subjected to the read after 
write dependence detection. The presence of the read after write 
dependence from the store instruction "ST1" executed by the processor 52 

10 into the load instruction executed by tire self-processor 50 is detected. The 
address "A3" of the store instruction "STl" is converted by the hash 
function circuit 31 to the entry number "00" of the instruction history table 
32. The data dependence detector 30 reads out the content of the entry "00" 
on the instruction history table 32 and judges the presence of the read after 

15 write dependence from the store instruction "STl" into the load instruction 
executed by the self-processor 50. In this case, the logical "1" stored in the 
cycle "0" by the load instruction "LD3" is read out from the entry "00" of 
the instruction history tabic 32. Namely, the read after write dependence 
from the store instruction "STl" into the load instruction"LD3" is detected. 

20 For this reason, the data dependence speculative execution of the thread 
"2" by the processor 50 is judged to be the failure, and the notice of the 
failure of the data dependence speculative execution is notified through the 
data dependence detected result 36 to the thread control unit 55. 

The thread control unit 55 receives the notice of the failure of the 
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data dependence speculative execution from either the processor 50, and 
the thread control unit 55 sends a request for recovery process against the 
failure of the data dependence speculative execution to the processor which 
has supplied the above notice of the failure of the data dependence 
5 speculative execution to the processor 50 and the processor 53 executing 
the thread "3" which is post in program sequence to the thread "2" 
subjected to the data dependence speculative execution. 

The detection of the read eifter write dependence by the data 
dependence detector 30 in the data dependence speculative execution as 

10 well as the judgement of the success and the failure of the data dependence 
speculative execution have been described with reference to the processor 
50. Tn each of the processor 51 executing the thread "1", the processor 52 
executing the thread "0", and processor 53 executing the thread "3", the 
data dependence detector 30 detects the read after write dependence and 

15 judge the success and the failure of the data dependence speculative 
execution. For the processors 51 and 52, the read after write dependence is 
not detected and the success of the data dependence speculative execution 
is judged. For this reason, the executions have been completed in cycle 3. 
For the processor 53, the read after write dependence is not detected and 

20 the success of the data dependence speculative execution is judged. 
However the processor 50 has the failure of the data dependence 
speculative execution to the thread "2" which is prior in program sequence 
to the thread "3" executed by the processor 53, the recovery process lor the 
failure of the data dependence speculative execution is requested. 
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The processors 50 and 53 are requested for the recovery process 
for the failure of the data dependence speculative execution. In the 
processors 50 and 53, the processor control unit 57 receives the request 
from the recovery process request 47 and cancels the executed results of 
5 the thread up to the cycle 3 and subsequently starts the re-execution of the 
thread in the cycle 5. The threads "0" and "1" have been executed 
completely. The thread "2" is the most prior thread in the unexecuted 
threads "2" and "3". The thread "2" is definitively executed. The processor 
50 executes the load instruction "LD3" in cycle 5, the other instruction in 

10 cycle 6, and the load instruction "LD4" in cycle 7 definitively. The read 
after write dependence is present from the store instruction "ST1" included 
in the thread "0" to the load instruction "LD3" executed in cycle 5, The 
store instruction "ST1" has been executed by the processor 52 in cycle 3. 
This is no violation to the read after write dependence. The read after write 

15 dependence from the thread 2 to the thread 3, namely the read after write 
dependence from the store instruction included in the thread 2 to the load 
instruction included in the thread 3 is unknown. The data dependence 
Speculative execution is performed again. The processor 53 speculatively 
executes the load instruction "LD5" in cycle 5, the store instruction "ST3" 

20 in cycle 6 and other instruction in cycle 7. In this case, the read after write 
dependence is not detected from the thread 2. The data dependence 
speculative execution is judged to be the success. The execution of the 
thread 3 has becii completed in cycle 7. 

The operations of the data dependence speculative execution has 
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been described above. The program including the twelve instructions are 
executed by the four processors in the thread parallel processings, if the 
data dependence speculative execution was succeeded, then the execution 
of the program has been completed in three cycles. If the data dependence 

5 speculative execution has one failure, then the execution of the program 
has been completed in seven cycles. 

The above description has been made, assuming that each thread 
includes only three instructions. Actually, however, the thread may include 
a large number of instructions, for example, several tends instructions to 

10 several tends thousands instructions. A large number of the load 
instructions are subjected to the data dependence speculative execution. 

The conventional data dependence detector 100 needs to store the 
addresses of all of the load instructions. If the free space of the address 
buffer 101 becomes disappear, the conventional data dependence detector 

15 100 is no longer capable of performing the data dependence speculative 
execution, but capable of performing the sequential and definitive 
execution. The conventional data dependence detector 100 docs not 
provide any sufficient effect of improving the high speed execution of the 
program by the data dependence speculative execution in the thread 

20 parallel processings. In order to obtain the sufficient effect of improving the 
high speed execution of the program, it is necessary to provide a large 
hardware size. 

By contrast, the data dependence detector 30 of this embodiment 
allows the areas for the addresses of the load/store instructions, so that it is 
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unnecessary to store the addresses of all of the load instructions. This 
results in that there is no limitation to the number of the load instructions 
which may be subjected to the data dependence speculative execution. The 
necessary hardware size is also small. Since the necessary time for 

5 detecting the read after write dependence is short, this makes it possible to 
improve the high speed and high frequency performances of the processor. 
The data dependence detector 30 selects the thread for the read after write 
dependence based on the thread sequence 40 supplied from the thread 
control unit 55. The data dependence detector 30 is capable of detecting 

10 correctly the read after write dependence in response with flexibility to the 
thread sequence. 

The data dependence detector 30 refers the thread sequence 40 
supplied from the thread control unit 55 and designates the processor 
executing the prior thread which is prior in program sequence to the thread 

15 executed by the self-processor. Only when the processor for the prioT 
thread executes the load instruction, the read out operation from the 
instruction history table 32 is made, and subsequently the logical-OR 
operation is made by the logical-OR function 41 to realize the detection of 
the presence of the read after write dependence from the prior thread to the 

20 thread executed by the self-processor. 

It is possible that if the other processor executes the store 
instruction, the read out operation is always made from the instruction 
history table 32 independent from the issue of whether it is prior to the self- 
processor. Tn this case, when the logical-OR function 41 performs the 
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logical-OR operation, the data dependence detector 30 refers the thread 
sequence 40 supplied from the thread control unit 55 and designates the 
processor executing the prior thread which is prior in program sequence to 
the thread executed by the self -processor, so as to detect the presence of the 
5 read after write dependence from the prior thread to the thread executed by 
the self -processor. 

For the data dependence speculative execution in the thread unit 
in the thread parallel processings, if the load instruction is executed by the 
self-processor, then the address of the load instruction is converted by the 

10 hash function circuit 31 into the entry number of the instruction history 
table 32, so that a value indicating that the load instruction has been 
executed is stored into the entry designated by the entry number. If the load 
instruction is executed by the other processor, then the address of the load 
instruction is converted by the hash function circuit 31 into the entry 

15 number of the instruction history table 32, so that a value indicating 
whether or not the load instruction has been executed is read out from the 
entry designated by the entry number. As a result, the possibility of the 
presence of the read after write dependence from the store instruction 
executed by the other processor to the load instruction executed by the self- 

20 instruction is detected. The above described structure may also be 
applicable to detect the possibility of a write after read dependence or the 
possibility of an write after write dependence. 

For example, if the store instruction has been executed by the 
self-processor, the address of the store instruction is converted by the hash 
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function circuit 31 into the entry number of the instruction history table 32. 
The value indicating that the store instruction lias been executea oy tne 

self-processor is stored into the entry of the instruction history table 32, 

which is indicated by the entry number as converted. If the load instruction 

5 has been executed by the other processor, the address of the load 

instruction is converted by the hash function circuit 31 into the entry 

number of the instruction history table 32. The value indicating that the 

store instruction has been executed by the self-processor is read out from 

the entry of the instruction history table 32, which is indicated by the entry 

10 number as converted. As a result, the possibility of the presence of the 
write after read dependence from the load instruction executed by the other 
processor to the store instruction executed by the self-processor is detected 
by the data dependence detector. 

For example, if the store instruction has been executed by the 

15 other processor, the address of the store instruction is converted by the hash 
function circuit 31 into the entry number of the instruction history table 32. 
The value indicating that the store instruction has been executed by the 
self-processor is read out from the entry of the instruction history table 32, 
which is indicated by the entry number as converted. In addition, if the 

20 store instruction has been executed by the self-processor, the address of the 
store instruction is converted by the hash function circuit 31 into the entry 
number of the instruction history table 32. The value indicating that the 
store instruction has been executed by the self-processor is stored into the 
entry of the instruction history table 12, which is indicated by the entry 
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number as converted. As a result, the possibility of the presence of the 
write after write dependence from the store instruction executed by the 
other processor to the store instruction executed by the self-processor is 
detected by the data dependence detector. 
5 For example, if the load instruction has been executed by the 

other processor, the address of the load instruction is converted by the hash 
function circuit 31 into the entry number of the instruction history table 32. 
The value indicating that the store instruction has been executed by the 
self-processor is read out from the entry of the instruction history table 32, 
10 which is indicated by the entry number as converted. In addition, if the 
store instruction has been executed by the other processor, the address of 
the store address is converted by the hash function circuit 31 into the entry 
number of the instruction history table 32. The value indicating that the 
load instruction has been executed by the self-processor as well as the 
15 value indicating that the store instruction has been executed by the self- 
processor arc read out from the entry of the instruction history tabic 32, 
which is indicated by the entry number as converted. If the load instruction 
has been executed by the self-processor, the address of the load instruction 
is converted by the hash function circuit 31 into the entry number of the 
20 instruction history table 32. The value indicating that the load instruction 
has been executed by the self-processor is stored into the entry of the 
instruction history table 32, which is indicated by the entry number as 
converted. In addition, if the store instruction has been executed by the 

self-processor, the address of the store instruction is converted by the hash 
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function circuit 31 into the entry number of the instruction history table 32. 
The value indicating that the store instruction has been executed by the 
self-processor is stored into the entry of the instruction history table 32, 
which ia indicated by the entry number ae converted. As a result, the 
5 possibility of the presence of the read after write dependence from the store 
instruction executed by the other processor to the load instruction executed 
by the self-processor, the possibility of the presence of the write after read 
dependence from the load instruction executed by the other processor to the 
store instruction executed by the self-processor and the possibility of the 

10 presence of the write after write dependence from the store instruction 
executed by the other processor to the store instruction executed by the 
self-processor are detected. 

The data dependence detector 30 is capable of detecting the read 
after write dependence between the threads but incapable of detecting the 

15 read after write dependence between the instruction in the same thread. The 
data dependence detector 30 allows the data dependence speculative 
execution in the thread units in the thread parallel processings but does not 
allow the data dependence speculative execution in the instruction units in 
the thread. 

20 The data dependence detectoT 30 provides the effect that the 

necessary hardware size for the data dependence speculative execution in 
the thread units in the thread parallel processings is small. The number of 
the load instructions which may be subjected to the data dependence 
speculative execution is not limited by the hardware size. A small hardware 
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size may obtain the improvement of the high speed processing by the data 
dependence speculative execution in the thread units in the thread parallel 
processings. The necessary time for detecting the dependence between the 
threads is short. The high speed and high frequency performances of the 
5 processor may be improved. 

THIRD FMBODTMFNT ■ 

A third embodiment according to the present invention will be 

described in detail with reference to the drawings. FIG. 10 is a diagram 
10 illustrative of a structure of a processor 59 included in a multiple processor 

system in a third embodiment in accordance with the present invention. 

FIG. 7C is a diagram illustrative of a first example of the hash function 

circuit in the third embodiment in accordance with the present invention. 

The system includes plural processors including the processor 59, and a 
15 thread control unit not illustrated. The processor 59 has almost the same 

internal structure as the processors 50, 51, 52 and 53 described in the above 

second embodiment. 

If the thread allocated by the thread control unit is executed by 

the processor 59, a processor control unit 63 in the processor 59 supplies a 
20 data dependence detector 60 with a speculative execution flag 33 which 

indicates whether the instructions included in the thread should be executed 

definitively in accordance with the program sequence or executed 

speculatively independent from the program sequence. The processor 
control unit 63 supplies the kind of the instructions to be executed to the 
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instruction execution unit 18 and the data dependence detector 60 as well 
as to the other three processors through the execution instructions 46. The 
data dependence detector 60 receives the input of the kind of the instruction 
to be executed from the execution instruction 34. The data dependence 
5 detector 60 also receives the inputs of the kinds of the instructions to be 
executed by the other three processors from the execution instructions 38. 
If the instruction to be executed is the load instruction or the store 
instruction, the processor control unit 63 supplies the address of the 
instruction to the instruction execution unit 18 and the data dependence 

10 detector 60 as well as to the other three processors through the load/store 
addresses 45. The data dependence detector 60 receives the input of the 
address of the load/store instruction from the load/store address 35. 
Concurrently, the instructions to be executed by the other three processors 
are the load or store instructions, then the data dependence detector 60 

15 receives the input of the addresses of the load/store instructions from the 
load/store addresses 37. 

The data dependence detector 60 includes four hash function 
circuits 31, an instruction history tabic 62 and a logical-OR circuit 61. The 
data dependence detector 60 receives inputs of the speculative execution 
2(1 flag 33, the execution instruction 34, and the load/store address 35 from the 
processor control unit 63 in the self-processor as well as receives inputs of 
the execution instructions 38, and the load/store addresses 37 from the 
other three processors. The data dependence detector 60 also receives 
inputs of the speculative execution flag 39 and the thread sequence 40 from 
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the thread control unit. The data dependence detector 60 has a data 
dependence detected result 64 which is supplied to the thread control unit. 

Each of the processors has four hash function circuits 31, each of 
which comprises a logic circuit realizing a hash function "f which 
5 converts the address of the load instruction or the store instruction into the 
entry number of the instruction history table 62. The hash function "f" 
takes the same value for the same input. If N't =f(Al) and N2 = f(A2), then 
Al = A2 cases Nl = N2. One of the four hash function circuits 31 converts 
the address of the load or store instruction to be executed by the self- 

10 processor into the entry number of the instruction history table 62. Each of 
the remaining three hash function circuits 31 converts the address of the 
store instruction to be executed by respective one of the other three 
processors into the entry number of the instruction history table 62. 

The instruction history table 62 comprises a plurality of entry 

15 which stores one-bit information. If the output from the hash function 
circuit 31 has an n-bit width, then a total number of the entries is equal to 
the n-power of 2. The instruction history table 62 has a single write port 
and three read ports for performing a single write operation and three read 
operations concurrently. One of the four hash function circuits 31 has the 

20 input port of the address of the load instruction to be executed by the self- 
processor. This hash function circuit 31 has an output port which is 
connected to the write port of the instruction history tabic 32 for write 
operation to the entry designated by the entry number supplied from the 
hash function circuit 31. The remaining three hash function circuits 31 
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having the input ports of the addresses of the store instructions to be 
executed by the processors have an output port which is connected to the 
read port of the instruction history table 62. The instruction history table 62 
is capable of initializing all of the entries upon receipt of the input of the 
5 speculative execution flag 39 from the thread control unit and from the 
speculative execution flag 33 supplied from the processor control unit 63 
and inputted into the data dependence detector 60. A logic-OR function 41 
operates a logic-OR of a single read/write port and three read ports of the 
instruction history table 62, and this operated result is supplied to the data 
10 dependence detected result 64 which is transmitted to the thread control 
unit. 

Operations of the data dependence detector 60 will be described. 
In an initial state, all of the entries of the instruction history tabic 62 arc in 
the first state, for example, logic "0" which indicates that the load 

15 instruction has not been executed by the sel f-processor. If the sel f-processor 
does not perform the speculative instruction execution to the data 
dependence between the threads and docs perform the thread execution of 
the threads in the order defined by the program (hereinafter referred to as 
"definitive execution stale", and further if the self-processor does not 

20 perform the speculative instruction execution to the data dependence 
between the instructions included in the thread and does perform the 
instruction execution in the order defined by the program, then the data 
dependence detector 60 does not perform any operation. 

If the self-processor does perform the speculative instruction 
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execution to the data dependence between the threads, then the data 
dependence detector 60 docs perform a detecting process for a read after 
write dependence from the store instruction to the load instruction, wherein 
the store instruction is to be executed by the processor allocated with a 
5 prior thread which is prior in program sequence to the thread to be executed 
by the self-processor, whilst the load instruction is to be executed by the 
self-processor. 

The speculative execution flag 39 supplied from the thread 
control unit and inputted into the data dependence detector 60 indicates 

10 whether the self-processor is in the definitive execution state or the 
speculative execution state. Whether the thread to be executed by the self- 
processor is prior in program sequence to the threads to be executed by the 
other three processors is judged by the thread sequence 40 which is 
supplied from the thread control unit and inputted into the data dependence 

15 detector 60. The kinds of the instructions on executions by the other three 
processors arc judged by the execution instructions 38 supplied from the 
other three processors into the data dependence detector 60. 

If the self-processor is in the speculative instruction execution 
state to the data dependence between instructions included in the thread, 

20 the data dependence detector 60 does perform a detecting process for a read 
after write dependence from the store instruction to the load instruction, 
wherein the store instruction has previously been executed by the self- 
processor, whilst the load instruction is to be executed by the self-processor. 
The speculative execution flag 33 supplied from the processor 63 and 
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inputted into the data dependence detector 60 indicates whether the self- 
processor is in the definitive execution state or the speculative execution 
state for execution of the instructions included in the thread. The kinds ot 
the instructions executed by the self-processor is judged by the execution 
5 instructions 34 supplied from the processor 63 into the data dependence 
detector 60. 

If the self-processor is in the speculative execution state to the 
thread executions or the instruction executions in the thread, after the self- 
proccssor performs the speculative execution of the load instruction, then 

10 the data dependence detector 60 stores that the load instruction has been 
speculatively executed to the instruction history table 62. In more detail, 
the address of the load instruction is converted by the hash function circuit 
31 into the entry number of the instruction history table 62 for input into 
the write port of the instruction history table 62. The instruction history 

15 table 62 converts a corresponding entry to the inputted entry number into a 
second state, for example, a logical value "1" which indicates that the load 
instruction has speculatively been executed. 

If the self-processor is in the speculative execution state for 
execution of the threads and if the store instruction is executed by the 

20 processor which is allocated with the prior thread which is prior in program 
sequence to the thread to be executed by the self-processor, then the data 
dependence detector 60 read out the state from the corresponding entry of 
the instruction history table 62. In more detail, the address of the store 
instruction is converted by the hash function circuit 31 into the entry 
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number of the instruction history table 62 for input into the corresponding 
one of the three input pons of the instruction history table 62. The 
instruction history table 62 reads out the content of the entry designated by 
the entry number inputted and supplies the content to the logic-OR circuit 
5 61 . If the self-processor is in the speculative execution state for execution 
of the instructions included in thread and if the store instruction is executed 
by the self-processor, then the data dependence detector 60 read out the 
state from the corresponding entry of the instruction history table 62. In 
more detail, the address of the store instruction is converted by the hash 

10 function circuit 31 into the entry number of the instruction history table 62 
for input into the corresponding one of the input ports of the instruction 
history tabic 62. The instruction history table 62 reads out the content of 
the entry designated by the entry number inputted and supplies the content 
to the logic-OR circuit 61. The logic-OR circuit 61. operates the logic-OR 

15 of the outputs from the three red ports of the instruction history table 32 
and supplies the operated result to the data dependence detected result 64 
which is transmitted to the thread control unit. 

In case that the threads are executed by the data dependence 
speculative execution, if the address of the store instruction as executed by 

20 either of the processors executing the prior thread which is prior in program 
sequence to the thread executed by the self-processor is either equal to the 
address of the load instruction subjected to the data dependence speculative 
execution by the self-processor or allocated to the same entry of the 
instruction history table 62 due to appearance of the areas, then the value 
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indicating, that the load instruction subjected to the data dependence 
speculative execution is present, is read out from the instruction history 
table 62, the possibility of the presence of the read after write dependence 
between the threads is detected. In this case, the data dependence detector 
5 60 supplies the value indicating the failure of the data dependence 
speculative execution, for example, logical "1" onto the data dependence 
detected result 64 which is then transmitted to the thread control unit. 

In case that the instructions in the thread are executed by the data 
dependence speculative execution, if the address of the store instruction as 

10 executed by the processor is either equal to the address of the load 
instruction subjected to the data dependence speculative execution by the 
self-processor or allocated to the same entry of the instruction history table 
62 due to appearance of the areas, then the value indicating, that the load 
instruction subjected to the data dependence speculative execution is 

"15 present, is read out from the instruction history table 62, the possibility of 
the presence of the read after write dependence between the threads is 
detected. In this case, the data dependence detector 60 supplies the value 
indicating the failure of the data dependence speculative execution, for 
example, logical "1" onto the data dependence detected result 64 which is 

20 then transmitted to the thread control unit. 

The thread control unit receives the notice of the failure of the 
data dependence speculative execution from either the processors including 
the processor 59, and the thread control unit sends a request for recovery 
process against the failure of the data dependence speculative execution to 
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the processor which has supplied the above notice of the failure of the data 
dependence speculative execution as well as to the processors executing the 
thread post in program sequence to the thread executed by the processor 
which has supplied the above notice. The processors have received the 
5 request for the recovery process to the failure of the data dependence 
Speculative execution, wherein the processor control unit 63 receives the 
notice of the request foT the recovery process from the recovery execution 
request 47. 

The reason why the request for the recovery process to the failure 
10 of the data dependence speculative execution is sent to not only the 
processor which has supplied the above notice of the failure of the data 
dependence speculative execution but also the processors executing the 
thread post in program sequence to the thread executed by the processor 
which has supplied the above notice due to not only the failure of the data 
15 dependence speculative execution in the thread units but also the failure of 
the data dependence speculative execution in the instruction units in the 
thread would be as follows. It is possible that the processor which has the 
failure of the data dependence speculative execution of the instructions in 
the thread has subjected the load instruction to the data dependence 
20 speculative execution in violation to the read after write dependence. It is 
possible that the executed result of the thread with the failure of the data 
dependence speculative execution which has been executed by the 
processor having the failure of the data dependence speculative execution 
is not correct. It is thus possible that the executed result of the thread post 
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in program sequence to the thread with the failure of the data dependence 
speculative execution is also not correct. Consequently, it is, therefore, 
possible that the executed results of any threads, which are post in program 
sequence to the thread with the failure of the data dependence speculative 
5 execution would be not correct. For those reasons, the request for the 
recovery process to the failure of the data dependence speculative 
execution is sent to not only the processor which has supplied the above 
notice of the failure of the data dependence speculative execution but also 
the processors executing the thread post in program sequence to the thread 

10 executed by the processor which has supplied the above notice, in order to 
ensure the correctness of the executed result of the threads. 

In the processor having received the request for the recovery 
process against the failure of the data dependence speculative execution of 
the thread unit or the instruction unit included in the thread, the processor 

15 control unit 63 is notified of the request for the recovery process against the 
failure of the data dependence speculative execution from the recovery 
process execution request 47. In this case, each of the processors performs 
the necessary recovery process to ensure the correctness of the executed 
result of the threads. 

20 As described above, the data dependence detector 60 of this 

embodiment is capable of not only detecting the read after write 
dependence from the store instruction included in the prior thread, which is 
prior in program sequence to the thread executed by Lhe sell-processor, into 
the load instruction included in the thread executed by the self-processor 
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but also detect the read after write dependence between instructions 
included in the thread executed by the self-processor, namely, the read after 
write dependence from the store instruction included in the thread executed 
by the self-processor to the load instruction included in the thread executed 
5 by the self-processor. Hie above two functions are realized by that the 
instruction history table 62 in the data dependence detector 60 has 
read/write ports for write operation in the data dependence speculative 
execution of the load instruction by the self-processor and also for the read 
operation in the data dependence speculative execution of the store 

10 instruction by the self-processor. In the thread parallel processings, the data 
dependence speculative execution in the thread units is realized for 
obtaining a high speed execution of the program by the parallel executions 
of the threads as well as for realizing the data dependence speculative 
execution in the instruction units in the thread, whereby the program 

15 execution independent from the program sequence at the high speed 
processings can be realized. 

The data dependence detector 60 refers the thread sequence 40 
supplied from the thread control unit and designates the other processor 
executing the prior thread which is prior in program sequence to the thread 

20 executed by the self-processor. Only when the other processor for the prior 
thread executes the store instruction, the read out operation from the 
instruction history table 62 is made, and subsequently the logical-OR 
operation is made by the logical-OR function 61 to realize the detection of 
the presence of the read after write dependence from the prior thread to the 
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thread executed by the self-processor. 

It is possible that if the other processor executes the store 
instruction, the read out operation is always made from the instruction 
history tabic 62 independent from the issue of whether it is prior to the self- 
5 processor. In this case, when the logical-OR function 61 performs the 
logical-OR operation, the data dependence detector 60 refers the thread 
sequence 40 supplied from the thread control unit and designates the 
processor executing the prior thread which is prior in program sequence to 
the thread executed by the self-processor, so as to detect the presence of the 
10 read after write dependence from the prior thread to the thread executed by 
the self-processor. 

For the data dependence speculative execution in the thread unit 
in the thread parallel processings, if the load instruction is executed by the 
self-processor, then the address of the load instruction is converted by the 
15 hash function circuit 31 into the entry number of the instruction history 
table 62, so that a value indicating that the load instruction has been 
executed is stored into the entry designated by the entry number. If the 
store instruction is executed by the self-processor or the other processor, 
then the address of the store instruction is converted by the hash function 
20 circuit 31 into the entry number of the instruction history table 62, so that a 
value indicating whether or not the load instruction has been executed by 
the self-processor is read out from the entry designated by the entry number. 
As a result, the possibility of the presence of the read after write 
dependence from the store instruction executed by the self-processor or the 
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other processor to the load instruction executed by the self-instruction is 
detected. The above described structure may also be applicable to detect 
the possibility of a write after read dependence or the possibility of an write 
after write dependence. 
5 For example, if the store instruction has been executed by the 

self processor, the address of the store instruction is converted by the. hash 
function circuit 31 into the entry number of the instruction history table 62. 
The value indicating that the store instruction has been executed by the 
self-processor is stored into the entry of the instruction history tabic 62, 

10 which is indicated by the entry number as converted. If the load instruction 
has been executed by the self-processor or the other processor, the address 
of the load instruction is converted by the hash function circuit 31 into the 
entry number of the instruction history table 62. The value indicating that 
the store instruction has been executed by the self-processor is read out 

15 from the entry of the instruction history table 62, which is indicated by the 
entry number as converted. As a result, the possibility of the presence of 
the write after read dependence from the load instruction executed by the 
self-processor or the other processor to the store instruction executed by the 
self -processor is detected by the data dependence detector 60. 

20 For example, if the store instruction has been executed by the 

other processor, the address of the store instruction is converted by the hash 
function circuit 31 into the entry number of the instruction history table 62. 
The value indicating that the store instruction has been executed by the 
self-processor is read out from the entry of the instruction history tabic 32, 
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which is indicated by the entry number as converted. In addition, if the 
store instruction has been executed by the self-processor, the address of the 
store instruction is converted by the ha&h function circuit 31 into the entry 
number of the instruction history table 62. The value indicating that the 
5 store instruction has been executed by the self-processor is read out from 
the entry of the instruction history table 12, which is indicated by the entry 
number as converted, and in addition, the value indicating that the store 
instruction has been executed by the self-processor is stored into the entry 
of the instruction history table 12, which is indicated by the entry numher 

10 as converted. As a result, the possibility of the presence of the write after 
write dependence from the store instruction executed by the self-processor 
or the other processor to the store instruction executed by the self -processor 
is detected by the data dependence detector. 

For example, if the load instruction has been executed by the 

15 self-processor or the other processor, the address of the load instruction is 
converted by the hash function circuit 31 into the entry number of the 
instruction history table 62. The value indicating that the store instruction 
has been executed by the self-processor is read out from the entry of the 
instruction history table 62, which is indicated by the entry number as 

20 converted. In addition, if the store instruction has been executed by the 
self-processOT or the other processor, the address of the store address is 
converted by the hash function circuit 31 into the entry number of the 
instruction history table 62. The value indicating that the load instruction 
has been executed by the self-processor is read out from the entry of the 
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instruction history tabic 62> which is indicated by the entry number as 
converted. If the load instruction has been executed by the self-processor, 
the address of the load instruction is converted by the hash function circuit 
31 into the entry number of the instruction history tabic 62. The value 
5 indicating that the load instruction has been executed by the self-processor 
is stored into the entry of thi, f instruction history table 62, which is indicated 
by the entry number as converted. In addition, if the store instruction has 
been executed by the self-processor, the address of the store instruction is 
converted by the hash function circuit 31 into the entry number of the 

10 instruction history table 62. The value indicating that the store instruction 
has fififin mmnitfifl hy thf, self-processor i<s stored into the entry of the 
instruction history table 62, which is indicated by the entry number as 
converted. As a result, the possibility of the presence of the read after write 
dependence from the store instruction executed by the self-processor or the 

15 other processor to the load instruction executed by the self-processor, the 
possibility of the presence of the write after read dependence from the load 
instruction executed by the self-processor or the other processor to the store 
instruction executed by the self-processor and the possibility of the 
presence of the write after write dependence from the store instruction 

20 executed by the self-processor or the other processor to the store instruction 
executed by the self-processor are detected. 

The data dependence detector 30 is capable of detecting the read 
after write dependence between the threads but incapable of detecting the 



read after write dependence 



between the instruction in the same thread. The 
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data dependence detector 30 allows the data dependence speculative 
execution in the thread units in the thread parallel processings but docs not 
allow the data dependence speculative execution in Ul£ instruction UMtS in 
the thread. 

5 The data dependence detector 60 is capable of both the data 

dependence speculative execution in the thread units in the thread parallel 
processings and the data dependence speculative execution in the 
instruction units in the thread. The data dependence detector 60 provides 
the effect that the necessary hardware size for the data dependence 

10 speculative execution is small. The number of the load instructions which 
may be subjected to the data dependence speculative execution is not 
limited by the hardware size. A small hardware size may obtain the 
improvement of the high speed processing by both the data dependence 
speculative execution in the thread units in the thread parallel processings 

15 and the data dependence speculative execution in the instruction units in 
the thread. The necessary time for detecting the data dependence is short. 
The high speed and high frequency performances of the processor may be 
improved. 

In accordance with the above-described present invention, the 
20 processor including the data dependence detector may be realized as a 
semiconductor integrated circuit (LSI) on a semiconductor substrate. 

Although the invention has been described above in connection 
with several preferred embodiments therefor, it will be appreciated that 
those embodiments have been provided solely for illustrating the invention, 
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and not in a limiting sense. Numerous modifications and substitutions of 
equivalent materials and techniques will be readily apparent to those skilled 
in the art after reading the present application, and all such modifications 
and substitutions arc expressly understood to fall within the true scope and 
5 spirit of the appended claims. 
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